Systems for cell shape estimation

ABSTRACT

The present disclosure is directed, among other things, to automated systems and methods for analyzing, storing, and/or retrieving information associated with biological objects including lymphocytes. In some embodiments, a shape metric is derived for each detected and segmented lymphocyte and the shape metric is stored along with other relevant data.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of International ApplicationPCT/EP2019/059181, entitled “SYSTEMS FOR CELL SHAPE ESTIMATION” andfiled Apr. 11, 2019, which claims priority to U.S. Provisional PatentApplication No. 62/657,509, filed on Apr. 13, 2018. Each of theseapplications is hereby incorporated by reference herein in its entiretyand for all purposes.

BACKGROUND

Digital pathology involves scanning of whole histopathology orcytopathology glass slides into digital images interpretable on acomputer screen. These images are later processed by an imagingalgorithm or interpreted by a pathologist. To examine tissue sections(which are virtually transparent), tissue sections are prepared usingcolored histochemical stains that bind selectively to cellularcomponents. Color-enhanced, or stained, cellular structures are used byclinicians or a computer-aided diagnosis (CAD) algorithm to identifymorphological markers of a disease, and to proceed with therapyaccordingly. Observing the assay enables a variety of processes,including diagnosis of disease, assessment of response to treatment, anddevelopment of new drugs to fight disease.

Immunohistochemical (IHC) slide staining can be utilized to identify oneor more proteins in cells of a tissue section and hence is widely usedin the study of different types of cells, such as cancerous cells andimmune cells in biological tissue. Thus, IHC staining may be used inresearch to understand the distribution and localization of thedifferentially expressed biomarkers of immune cells (such as T-cells orB-cells) in a cancerous tissue for an immune response study. Forexample, tumors often include infiltrates of immune cells, which mayprevent the development of tumors or favor the outgrowth of tumors.

Lymphocytes (T cells), especially CD8 cytotoxic T cells, are key part ofthe anti-tumor immunity. To mount an effective immune response, T cellsmust achieve several distinct steps. First, T lymphocytes need to befully activated by mature dendritic cells in the tumor-draining lymphnode. Second, cancer-specific effector T cells must enter the tumorafter leaving the blood vessels. Finally, tumor-infiltrating lymphocytes(TIL) need to perform their function which ultimately leads to tumorregression. However, it is clearly recognized that tumors may escape Tcell attack by variety of mechanisms. One of them could be the locationof T cells within a tumor. Thus, in most human solid tumors, T cells arerarely in contact with cancer cells but greatly enriched in the stroma,a surrounding microenvironment composed of non-cancer cells along withthe extracellular matrix (ECM). An absence of T cell infiltration intotumor islets might constitute a major obstacle for T cell-mediatedanti-tumor activities.

The trafficking of T cells is a key process to allow and regulate theirimmune-surveillance duties. Indeed, the high motility capabilities ofimmune cells are coupled to their ability to detect and eliminatepathogens and tumors. Trafficking of cells to the site of disease is acritical step for a successful immune response against pathogens andcancer. In the cancer setting, the presence of tumor-infiltratinglymphocytes (TIL) has been reported to correlate well with positiveclinical outcomes. On the other hand, and in some tumor subtypes (e.g.luminal breast cancer), TILs are associated with poor prognosis.

Chemokines can attract T cells to the tumor site and tumor intrinsicpathways can influence the composition of local chemokines. On the otherhand, tumor-induced vasculature can hamper T cell migration. Moreover,other immune cells and tumor-derived molecules can block T cellproliferation and survival.

BRIEF SUMMARY

Immunotherapy with tumor infiltrating lymphocytes or other agents (e.g.checkpoint inhibitors) are promising approaches being widelyinvestigated for the treatment of cancers. Detecting lymphocytes instained histological tissue images is a critical step in the clinicalstudies. The quantification of lymphocytes provides one solution toquantify the immune response so that researchers can analyze thetreatment outcome of immunotherapy quantitatively. In addition tounderstanding the density and spatial arrangement of lymphocytes withrespect to a tumor or to individual tumor cells, Applicant submits thatunderstanding the motility of lymphocytes allows for a superiorunderstanding of whether a candidate patient will respond well totherapy.

The present disclosure relates, among other things, to automated systemsand methods for analyzing images of a biological sample stained with oneor more stains, identifying lymphocytes within the stained biologicalsample, and deriving one or more shape metrics for each identifiedlymphocyte. In some embodiments, the one or more derived shape metricsserve as a surrogate for lymphocyte motility, with those lymphocyteshaving a circular or nearly circular shape being indicative oflymphocytes that are likely not motile, while those lymphocytes havingan elongate shape being indicate of lymphocytes that are more likely tobe motile.

In one aspect of the present disclosure is a method of processing imageanalysis data derived from an image of a biological specimen stained forthe presence of at least one lymphocyte biomarker the method comprising:(a) detecting lymphocytes in the image; (b) computing a foregroundsegmentation mask based on the detected lymphocytes within the image;(c) identifying outlines of the detected lymphocytes in the image byfiltering the image with the computed foreground segmentation mask; (d)deriving a shape metric based on an outline of each of the segmentedlymphocytes; and (e) associating at least the derived shape metric foreach detected lymphocyte with coordinates for each detected lymphocyte.In some embodiments, the method further comprises retrieving storedcoordinates and associated shape metric data from a database andprojecting the retrieved data onto the image. In some embodiments, themethod further comprises unmixing the image (e.g., a multiplex image) ofthe biological specimen into individual image channel images, each imagechannel image representing signals corresponding to a single stain (e.g.a first lymphocyte biomarker stain channel, a second lymphocytebiomarker stain channel, a hematoxylin channel, etc.).

In some embodiments, the shape metric is a minor axis/major axis aspectratio, an eccentricity parameter, a circularity parameter, a roundnessparameter, or a solidity parameter. In some embodiments, the minoraxis/major axis aspect ratio is derived by: (i) fitting an ellipse tothe outline of each of the segmented lymphocytes; (ii) calculating alength of the fitted ellipse's minor axis and major axis; and (iii)calculating an aspect ratio between the calculated lengths of the minorand major axes. In some embodiments, the ellipse is fitted to theoutline of each of the segmented lymphocytes by performing a Houghtransform or a Randomized Hough Transform.

In some embodiments, the roundness parameter is derived by: (i) fittingan ellipse to the outline of each of the segmented lymphocytes; (ii)calculating a length of the fitted ellipse's major axis; (iii) derivingan area of the outline of each of the segmented lymphocytes; and (iv)calculating 4*{[the derived area]/Pi*[the calculated length of theellipse's major axis]²}.

In some embodiments, the biological sample is stained for the presenceof CD8 cytotoxic T cells (e.g. stained for the presence of CD8),regulatory T cells (e.g. stained for the presence of FOXP3), and/or forhelper T cells (e.g. stained for the presence of CD4). In someembodiments, multiple biomarkers are introduced such that a combinationof certain biomarkers leads to double staining of cells and thus asub-classification. For example, a tissue section could be stained forCD8 and Ki67 (proliferation marker). This would then allow forclassification of CD8 positive T cells into proliferating andnon-proliferating as well as sub-classifying the cells further into, forexample, proliferating and mobile (i.e. elongated) T cells.

In some embodiments, the method further comprises classifying each ofthe detected lymphocytes, such as within a predefined area (e.g. aregion of a tissue sample, an entire tissue area, a whole slide). Insome embodiments, detected lymphocytes are classified as cytotoxicT-lymphocytes, regulatory T-cells, or T-helper cells. In someembodiments, the method further comprises detecting a cell density foreach type of classified lymphocyte. In some embodiments, the methodfurther comprises quantitatively determining the number of cellspositive for at least one marker selected from the group consisting ofCD8, CD4, FOXP3, CD45RA, and CD45RO.

In some embodiments, the method further comprises comparing a value ofthe derived shape metric to a predetermined threshold value for theparticular derived shape metric and assigning a predictive cell motilitylabel to the detected lymphocyte based on the comparison. In someembodiments, the value of the derived shape metric is compared to aseries of ranges of predetermined threshold values and wherein eachdetected lymphocyte is assigned one of a plurality of cell motilitylabels based on the comparison.

In some embodiments, the method further comprises generating arepresentational object for each detected lymphocyte and overlaying therepresentational objects onto the detected lymphocytes in the image. Insome embodiments, the representational objects are a seed points, andwherein each seed point is assigned a color corresponding to one of aplurality of assigned cell motility labels. In some embodiments, therepresentational objects are filled outlines of each segmentedlymphocyte, and wherein each filled outline is assigned a colorcorresponding to one of a plurality of assigned cell motility labels.

In some embodiments, the method further comprises detecting andclassifying tumor cells within the image. In some embodiments, thebiological sample is stained for the presence of a PD-L1 biomarker andwherein an expression score is derived based on the number of tumorcells and lymphocytes expressing the PD-L1 biomarker. In someembodiments, the method further comprises classifying the detectedlymphocytes as tumor-infiltrating lymphocytes. In some embodiments, Pancytokeratin is utilized for the identification of epithelial tumor cells(e.g. PD-L1 can be expressed both in tumor and immune cells in general(not only lymphocytes)).

In another aspect of the present disclosure is a system for processingimage analysis data derived from an image of a biological specimenstained for the presence of at least one lymphocyte biomarker, thesystem comprising: (i) one or more processors, and (ii) a memory coupledto the one or more processors, the memory to store computer-executableinstructions that, when executed by the one or more processors, causethe system to perform operations comprising: detecting lymphocytes inthe image of the stained biological sample; identifying outlines of thedetected lymphocytes by segmenting the detected lymphocytes from othercells within the image; deriving a shape metric based on the identifiedoutlines of each of the detected lymphocytes; and associating at leastthe derived metrics for each detected lymphocyte with lymphocytelocation information. In some embodiments, the associated metrics andinformation may be stored in a database.

In some embodiments, the shape metric is a minor axis/major axis aspectratio, an eccentricity parameter, a circularity parameter, a roundnessparameter, or a solidity parameter. In some embodiments, the minoraxis/major axis aspect ratio is derived by: (i) fitting an ellipse tothe outline of each of the segmented lymphocytes; (ii) calculating alength of the fitted ellipse's minor axis and major axis; and (iii)calculating an aspect ratio between the calculated lengths of the minorand major axes. In some embodiments, the ellipse is fitted to theoutline of each of the segmented lymphocytes by performing a Houghtransform or a Randomized Hough Transform.

In some embodiments, the system further comprises instructions forclassifying each of the detected lymphocytes within a predefined area ofthe image. In some embodiments, the detected lymphocytes are classifiedas cytotoxic T-lymphocytes, regulatory T-cells, or T-helper cells.

In some embodiments, the system further comprises instructions forcomparing a value of the derived shape metric to a predeterminedthreshold value for the particular derived shape metric and assigning apredictive cell motility label to the detected lymphocyte based on thecomparison. In some embodiments, the value of the derived shape metricis compared to a series of ranges of predetermined threshold values andwherein each detected lymphocyte is assigned one of a plurality of cellmotility labels based on the comparison.

In some embodiments, the system further comprises instructions forgenerating a representational object for each detected lymphocyte andoverlaying the representational objects onto the detected lymphocytes inthe image. In some embodiments, the representational objects are a seedpoints, and wherein each seed point is assigned a color corresponding toone of the assigned plurality of cell motility labels.

In another aspect of the present disclosure is a non-transitorycomputer-readable medium storing instructions for estimating shapes oflymphocytes in a biological sample stained for at least the presence ofthe lymphocytes comprising: detecting lymphocytes in the image of thestained biological sample; identifying outlines of the detectedlymphocytes by segmenting the detected lymphocytes from other cellswithin the image; and deriving a shape metric based on the identifiedoutlines of each of the detected lymphocytes. In some embodiments,non-transitory computer-readable medium further comprises instructionsfor storing the derived shape metrics for each of the detectedlymphocytes along with an x,y coordinate position of the detectedlymphocyte from the image.

The non-transitory computer-readable medium of claim 28, wherein theshape metric is a minor axis/major axis aspect ratio, an eccentricityparameter, a circularity parameter, a roundness parameter, or a solidityparameter. In some embodiments, the minor axis/major axis aspect ratiois derived by: (i) fitting an ellipse to the outline of each of thesegmented lymphocytes; (ii) calculating a length of the fitted ellipse'sminor axis and major axis; and (iii) calculating an aspect ratio betweenthe calculated lengths of the minor and major axes. In some embodiments,the ellipse is fitted to the outline of each of the segmentedlymphocytes by performing a Hough transform or a Randomized HoughTransform.

In some embodiments, non-transitory computer-readable medium furthercomprises instructions for comparing a value of the derived shape metricto a predetermined threshold value for the particular derived shapemetric and assigning a predictive cell motility label to the detectedlymphocyte based on the comparison. In some embodiments, the value ofthe derived shape metric is compared to a series of ranges ofpredetermined threshold values and wherein each detected lymphocyte isassigned one of a plurality of cell motility labels based on thecomparison.

In some embodiments, non-transitory computer-readable medium furthercomprises instructions for generating a representational object for eachdetected lymphocyte and overlaying the representational objects onto thedetected lymphocytes in the image. In some embodiments, therepresentational objects are a seed points, and wherein each seed pointis assigned a color corresponding to one of the assigned plurality ofcell motility labels.

Applicants submit that the present disclosure enables detectinglymphocytes, estimating their relative shape, and associating therelative shape information with location information in a more accurateand/or more efficient manner than could be performed by a pathologist.Applicants further believe that the systems and methods described hereinenable the collection and analysis of data at high speeds, such thatdata may be processed from a whole slide image, not just a portionthereof. Thus, the presently disclosed systems are not only novel, butfacilitate the efficient, high speed processing of data from biologicalsamples.

BRIEF DESCRIPTION OF THE FIGURES

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawings will be provided to the Office upon request and thepayment of the necessary fee. For a general understanding of thefeatures of the disclosure, reference is made to the drawings. In thedrawings, like reference numerals have been used throughout to identifyidentical elements.

FIG. 1 illustrates a representative digital pathology system includingan image acquisition device and a computer system, in accordance withsome embodiments.

FIG. 2 sets forth various modules that can be utilized in a digitalpathology system or within a digital pathology workflow, in accordancewith some embodiments.

FIG. 3A sets forth a flowchart illustrating the various steps ofdetecting lymphocytes and deriving a shape metric for each detectedlymphocyte, in accordance with some embodiments.

FIG. 3B sets forth a flowchart illustrating the various steps ofdetecting lymphocytes and deriving a shape metric for each detectedlymphocyte, in accordance with some embodiments.

FIG. 4 sets forth a flowchart illustrating one method of deriving ashape metric for detected lymphocytes, in accordance with someembodiments.

FIG. 5 illustrates a region of a tissue sample of colorectal cancer,stained with CD3/red and Perforin/DAB (brightfield). T-cells stain witha red membrane. This tissue region contains symmetric, round stainedcells (single arrows) and elongated cells (double arrows).

FIG. 6 illustrates a region of a tissue sample stained with CD3/cyan,CD4/green. CD8/red, and non-T-cell markers in yellow and blue(fluorescence). T-cells stain red, cyan, or green membranes This tissueregion contains symmetric, round stained cells (single arrows) andelongated cells (double arrows).

FIG. 7A illustrates a region of a tissue sample of colorectal cancerstained with CD3/red and Perforin/DAB. T-cells stain with a red membrane(brightfield).

FIG. 7B illustrates a segmentation and detection result for lymphocytes.The color overlay shows different cells in different colors.

FIG. 8A illustrates a region of a tissue sample of colorectal cancer,stained with CD3/red and Perforin/DAB. T-cells stain with a red membrane(brightfield).

FIG. 8B illustrates a segmentation and detection result for lymphocytes.Color coding indicates cell shape and motility from dark blue(stationary) to deep red (dynamic).

FIG. 9A illustrates a region of a tissue sample of colorectal cancer,stained with CD3/red and Perforin/DAB. T-cells stain with a red membrane(brightfield).

FIG. 9B illustrates a segmentation and detection result for lymphocytes.Color coding indicates cell shape and motility from dark blue(stationary) to deep red (dynamic).

DETAILED DESCRIPTION

It should also be understood that, unless clearly indicated to thecontrary, in any methods claimed herein that include more than one stepor act, the order of the steps or acts of the method is not necessarilylimited to the order in which the steps or acts of the method arerecited.

As used herein, the singular terms “a,” “an,” and “the” include pluralreferents unless context clearly indicates otherwise. Similarly, theword “or” is intended to include “and” unless the context clearlyindicates otherwise. The term “includes” is defined inclusively, suchthat “includes A or B” means including A, B, or A and B.

As used herein in the specification and in the claims, “or” should beunderstood to have the same meaning as “and/or” as defined above. Forexample, when separating items in a list, “or” or “and/or” shall beinterpreted as being inclusive, i.e., the inclusion of at least one, butalso including more than one, of a number or list of elements, and,optionally, additional unlisted items. Only terms clearly indicated tothe contrary, such as “only one of or “exactly one of,” or, when used inthe claims, “consisting of,” will refer to the inclusion of exactly oneelement of a number or list of elements. In general, the term “or” asused herein shall only be interpreted as indicating exclusivealternatives (i.e. “one or the other but not both”) when preceded byterms of exclusivity, such as “either,” “one of,” “only one of” or“exactly one of.” “Consisting essentially of,” when used in the claims,shall have its ordinary meaning as used in the field of patent law.

The terms “comprising,” “including,” “having,” and the like are usedinterchangeably and have the same meaning. Similarly, “comprises,”“includes,” “has,” and the like are used interchangeably and have thesame meaning. Specifically, each of the terms is defined consistent withthe common United States patent law definition of “comprising” and istherefore interpreted to be an open term meaning “at least thefollowing,” and is also interpreted not to exclude additional features,limitations, aspects, etc. Thus, for example, “a device havingcomponents a, b, and c” means that the device includes at leastcomponents a, b and c. Similarly, the phrase: “a method involving stepsa, b, and c” means that the method includes at least steps a, b, and c.Moreover, while the steps and processes may be outlined herein in aparticular order, the skilled artisan will recognize that the orderingsteps and processes may vary.

As used herein in the specification and in the claims, the phrase “atleast one,” in reference to a list of one or more elements, should beunderstood to mean at least one element selected from any one or more ofthe elements in the list of elements, but not necessarily including atleast one of each and every element specifically listed within the listof elements and not excluding any combinations of elements in the listof elements. This definition also allows that elements may optionally bepresent other than the elements specifically identified within the listof elements to which the phrase “at least one” refers, whether relatedor unrelated to those elements specifically identified. Thus, as anon-limiting example, “at least one of A and B” (or, equivalently, “atleast one of A or B,” or, equivalently “at least one of A and/or B”) canrefer, in one embodiment, to at least one, optionally including morethan one, A, with no B present (and optionally including elements otherthan B); in another embodiment, to at least one, optionally includingmore than one, B, with no A present (and optionally including elementsother than A); in yet another embodiment, to at least one, optionallyincluding more than one, A, and at least one, optionally including morethan one, B (and optionally including other elements); etc.

As used herein, the term “biological sample” or “tissue sample” refersto any sample including a biomolecule (such as a protein, a peptide, anucleic acid, a lipid, a carbohydrate, or a combination thereof) that isobtained from any organism including viruses. Other examples oforganisms include mammals (such as humans; veterinary animals like cats,dogs, horses, cattle, and swine; and laboratory animals like mice, ratsand primates), insects, annelids, arachnids, marsupials, reptiles,amphibians, bacteria, and fungi. Biological samples include tissuesamples (such as tissue sections and needle biopsies of tissue), cellsamples (such as cytological smears such as Pap smears or blood smearsor samples of cells obtained by microdissection), or cell fractions,fragments or organelles (such as obtained by lysing cells and separatingtheir components by centrifugation or otherwise). Other examples ofbiological samples include blood, serum, urine, semen, fecal matter,cerebrospinal fluid, interstitial fluid, mucous, tears, sweat, pus,biopsied tissue (for example, obtained by a surgical biopsy or a needlebiopsy), nipple aspirates, cerumen, milk, vaginal fluid, saliva, swabs(such as buccal swabs), or any material containing biomolecules that isderived from a first biological sample. In certain embodiments, the term“biological sample” as used herein refers to a sample (such as ahomogenized or liquefied sample) prepared from a tumor or a portionthereof obtained from a subject.

As used herein, the terms “biomarker” or “marker” refer to a measurableindicator of some biological state or condition. In particular, abiomarker may be a protein or peptide, e.g. a surface protein, that canbe specifically stained, and which is indicative of a biological featureof the cell, e.g. the cell type or the physiological state of the cell.An immune cell marker is a biomarker that is selectively indicative of afeature that relates to an immune response of a mammal. A biomarker maybe used to determine how well the body responds to a treatment for adisease or condition or if the subject is predisposed to a disease orcondition. In the context of cancer, a biomarker refers to a biologicalsubstance that is indicative of the presence of cancer in the body. Abiomarker may be a molecule secreted by a tumor or a specific responseof the body to the presence of cancer. Genetic, epigenetic, proteomic,glycomic, and imaging biomarkers can be used for cancer diagnosis,prognosis, and epidemiology. Such biomarkers can be assayed innon-invasively collected biofluids like blood or serum. Several gene andprotein based biomarkers have already been used in patient careincluding but, not limited to, AFP (Liver Cancer), BCR-ABL (ChronicMyeloid Leukemia), BRCA1/BRCA2 (Breast/Ovarian Cancer), BRAF V600E(Melanoma/Colorectal Cancer), CA-125 (Ovarian Cancer), CA19.9(Pancreatic Cancer), CEA (Colorectal Cancer), EGFR (Non-small-cell lungcarcinoma), HER-2 (Breast Cancer), KIT (Gastrointestinal stromal tumor),PSA (Prostate Specific Antigen), S100 (Melanoma), and many others.Biomarkers may be useful as diagnostics (to identify early stagecancers) and/or prognostics (to forecast how aggressive a cancer isand/or predict how a subject will respond to a particular treatmentand/or how likely a cancer is to recur).

A “foreground segmentation mask” is, for example, an image mask createdby a segmentation algorithm that allows separating one or more pixelblobs (to be used as “foreground pixels”) from other pixels(constituting the “background”). For example, the foregroundsegmentation mask may be generated by a nuclear segmentation algorithmand the application of the foreground segmentation mask on an imagedepicting a tissue section may allow identification of nuclear blobs inan image.

As used herein, the term “image data” as understood herein encompassesraw image data acquired from the biological sample, such as by means ofan optical sensor or sensor array, or pre-processed image data. Inparticular, the image data may comprise a pixel matrix.

As used herein, the term “immunohistochemistry” refers to a method ofdetermining the presence or distribution of an antigen in a sample bydetecting interaction of the antigen with a specific binding agent, suchas an antibody. A sample is contacted with an antibody under conditionspermitting antibody-antigen binding. Antibody-antigen binding can bedetected by means of a detectable label conjugated to the antibody(direct detection) or by means of a detectable label conjugated to asecondary antibody, which binds specifically to the primary antibody(indirect detection).

A “mask” as used herein is a derivative of a digital image wherein eachpixel in the mask is represented as a binary value, e.g. “1” or “0” (or“true” or “false”). By overlaying a digital image with said mask, allpixels of the digital image mapped to a mask pixel of a particular oneof the binary values are hidden, removed, or otherwise ignored orfiltered out in further processing steps applied on the digital image.For example, a mask can be generated from an original digital image byassigning all pixels of the original image with an intensity value abovea threshold to true and otherwise false, thereby creating a mask thatwill filter out all pixels overlaid by a “false” masked pixel.

A “multi-channel image” as understood herein encompasses a digital imageobtained from a biological tissue sample in which different biologicalstructures, such as nuclei and tissue structures, are simultaneouslystained with specific fluorescent dyes, quantum dots, chromogens, etc.,each of which fluoresces or are otherwise detectable in a differentspectral band thus constituting one of the channels of the multi-channelimage.

Overview

Applicant has developed a system and method for analyzing images ofstained biological specimens, including deriving shape metrics forlymphocytes identified within the images of the stained biologicalsamples. The derived shape metrics may be stored along with otherrelevant data, e.g. the coordinates of the identified lymphocytes, anarea measurement of each lymphocyte, a staining intensity of alymphocyte biomarker, etc. In some embodiments, the shape metricsderived from the images may be projected onto the images, in the form ofrepresentational objects correlating a value of the derived shape metricto a likelihood that a particular lymphocyte is motile (e.g. a range ofcolors with each color representing a different likelihood that thelymphocyte is dynamic or motile). Applicants believe that the systemsand methods of the present disclosure allow for predicting whetheridentified lymphocytes are motile. In some embodiments, theidentification of motile lymphocytes allows for an indication of thoselymphocytes that are capable of attacking tumor cells. Alternatively,the identification of non-motile lymphocytes allows for an indication ofwhether the identified lymphocytes are interacting with the tumor cells.

The tumor stroma consists of a variety of cell types that includeendothelial cells, fibroblasts, pericytes, and immune subtypes such aslymphocytes, granulocytes, and macrophages. The profile of the tumorinfiltrating lymphocytes (TILs) present within the tumormicroenvironment reflects the diversity in tumor biology and host-tumorinteractions. In various solid cancer settings, the frequency and typeof TILs have been reported to correlate with outcomes in some patients,although this may vary according to tumor type. Nevertheless, improvedantitumor responses have been shown to positively correlate withincreased cytotoxic T lymphocyte (CTL) infiltration in various cancers,including colorectal, breast, cervical cancers, and glioblastoma. SeeKim et al., “Tumor infiltrating lymphocytes, tumor characteristics, andrecurrence in patients with early breast cancer,” Am J Clin Oncol 2013;36: 224-31; Piersma et al., “High number of intraepithelial CD8+tumor-infiltrating lymphocytes is associated with the absence of lymphnode metastases in patients with large early-stage cervical cancer,”Cancer Res 2007; 67:354-61; and Kmiecik et al., “Elevated CD3+ and CD8+tumor-infiltrating immune cells correlate with prolonged survival inglioblastoma patients despite integrated immunosuppressive mechanisms inthe tumor microenvironment and at the systemic level, “J Neuroimmunol2013; 264:71-83,” the disclosures of which are hereby incorporated byreference herein in their entireties.

It is believed that naturally primed CTLs have the capacity to identifyand eradicate malignant cells through recognition of tumor-associatedantigens presented by MHCI. However, only a small number of CTLs aregenerally able to infiltrate the tumor site, which contrasts with the Tcell infiltration process in inflammatory or infectious disease setting.CTL trafficking (i.e. their migration) is a tightly controlled process,and factors such as mismatching of chemokine-chemokine receptor pairs,downregulation of adhesion molecules, and aberrant vasculature may allcontribute to the poor homing of these cells. An identification of thosemotile lymphocytes is believed to be a prognostic indicator as well asfor predicting whether a patient would benefit from treatment with aparticular immunotherapy or for predicting a patient's prognosis.

The systems and methods described herein facilitate the recognition ofwhether lymphocytes in a particular biological sample are more likely tobe motile from those that are not, thus providing prognostic informationwhich may be used to make informed clinical decisions. It is believedthat lymphocytes that are motile may have a shape that is more elongateand less round than a comparatively less motile lymphocyte. For example,FIG. 5 illustrates colorectal cancer cells chromogenically stained forthe identification of cytotoxic T cells. FIG. 5 particularly illustratesthat within the population of identified cytotoxic T cells, some aremore circular than others, i.e. some are circular in appearance whileothers are more elongate in shape (e.g. having a substantially ovoidshape or having a substantially elliptical shape). FIG. 6 similarlyillustrates a region of a tissue sample having various stains(fluorogenic stains) indicating the presence of cytotoxic T cells andhelper T cells, where some of the T cells again have a round orsymmetrical appearance, while others are more elongate (e.g. moreelliptical in shape as opposed to circular in shape). As noted herein,those T cells that are motile have a less round (i.e. less circular)appearance—they may be visually recognized as elliptical orsubstantially elliptical. As such, the shape of a lymphocyte may serveas a surrogate for lymphocyte motility, as evidenced by FIGS. 5 and 6.

At least some embodiments of the present disclosure relate to computersystems and methods for analyzing digital images captured frombiological samples, including tissue samples, stained with one or moreprimary stains (e.g. hematoxylin and eosin (H&E)) and one or moredetection probes (e.g. probes containing a specific binding entity whichfacilitates the labeling of targets within the sample). While examplesherein may refer to specific tissues and/or the application of specificstains or detection probes for the detection of certain markers (andhence diseases), the skilled artisan will appreciate that differenttissues and different stains/detection probes may be applied to detectdifferent markers and different diseases.

A digital pathology system 200 for imaging and analyzing specimens, inaccordance with some embodiments, is illustrated in FIG. 1. The digitalpathology system 200 may comprise an imaging apparatus 12 (e.g. anapparatus having means for scanning a specimen-bearing microscope slide)and a computer 14, whereby the imaging apparatus 12 and computer may becommunicatively coupled together (e.g. directly, or indirectly over anetwork 20). The computer system 14 can include a desktop computer, alaptop computer, a tablet, or the like, digital electronic circuitry,firmware, hardware, memory, a computer storage medium, a computerprogram or set of instructions (e.g. where the program is stored withinthe memory or storage medium), one or more processors (including aprogrammed processor), and any other hardware, software, or firmwaremodules or combinations thereof. For example, the computing system 14illustrated in FIG. 1 may comprise a computer with a display device 16and an enclosure 18. The computer can store digital images in binaryform (locally, such as in a memory, on a server, or another networkconnected device). The digital images can also be divided into a matrixof pixels. The pixels can include a digital value of one or more bits,defined by the bit depth. The skilled artisan will appreciate that othercomputer devices or systems may be utilized and that the computersystems described herein may be communicatively coupled to additionalcomponents, e.g. specimen analyzers, microscopes, other imaging systems,automated slide preparation equipment, etc. Some of these additionalcomponents and the various computers, networks, etc. that may beutilized are described further herein.

In general, the imaging apparatus 12 (or other image source includingpre-scanned images stored in a memory) can include, without limitation,one or more image capture devices. Image capture devices can include,without limitation, a camera (e.g., an analog camera, a digital camera,etc.), optics (e.g., one or more lenses, sensor focus lens groups,microscope objectives, etc.), imaging sensors (e.g., a charge-coupleddevice (CCD), a complimentary metal-oxide semiconductor (CMOS) imagesensor, or the like), photographic film, or the like. In digitalembodiments, the image capture device can include a plurality of lensesthat cooperate to prove on-the-fly focusing. An image sensor, forexample, a CCD sensor can capture a digital image of the specimen. Insome embodiments, the imaging apparatus 12 is a brightfield imagingsystem, a multispectral imaging (MSI) system or a fluorescent microscopysystem. The digitized tissue data may be generated, for example, by animage scanning system, such as a VENTANA iScan HT scanner by VENTANAMEDICAL SYSTEMS, Inc. (Tucson, Ariz.) or other suitable imagingequipment. Additional imaging devices and systems are described furtherherein. The skilled artisan will appreciate that the digital color imageacquired by the imaging apparatus 12 can be conventionally composed ofelementary color pixels. Each colored pixel can be coded over threedigital components, each comprising the same number of bits, eachcomponent corresponding to a primary color, generally red, green orblue, also denoted by the term “RGB” components.

FIG. 2 provides an overview of the various modules utilized within thepresently disclosed digital pathology system. In some embodiments, thedigital pathology system 200 employs a computer device orcomputer-implemented method having one or more processors 220 and atleast one memory 201, the at least one memory 201 storing non-transitorycomputer-readable instructions for execution by the one or moreprocessors to cause the one or more processors (220) to executeinstructions (or stored data) in one or more modules (e.g. modules 202through 210).

With reference to FIGS. 2, 3A and 3B, the present disclosure provides acomputer-implemented method of identifying lymphocytes within an imageof a biological sample having at least one stain and deriving at leastone shape metric which serves as a surrogate for lymphocyte motility. Insome embodiments, the system may include: (a) an imaging module 202 isadapted to generate image data of a stained biological sample (e.g. asample stained for at least the presence of one lymphocyte biomarker,such as CD3, CD4, CD8, etc.) (step 301); (b) running an unmixing module203 to provide image channel images corresponding to a particular stainor biomarker, (c) running a cell detection module 204 to at least detectlymphocytes within the image of the stained biological sample (step302), where the cell detection module 204 includes a trained classifier(e.g. SVM or Random Forest, as described herein); (d) running asegmentation module 205 to generate a foreground segmentation mask basedon the detected lymphocytes (step 303); (e) running a shape metricderivation module 207 to compute a shape metric for each detected andsegmented lymphocyte (step 304); (f) running a labeling module 208 togenerate labels and/or to derive the coordinates of identifiedlymphocytes; and (g) associating the at least one derived shape metricfor each detected and segmented lymphocyte with the respectivelymphocyte's coordinate position (e.g. a coordinate of a center seedpoint or the coordinates of the outline of the lymphocyte) and/orlocation information. The data may be stored in a database 240 (step305).

The skilled artisan will also appreciate that additional modules may beincorporated into the workflow as needed. For example, an overlaygeneration module 209 may be run such that a visual representation ofthe detected lymphocytes and/or an indicia which corresponds to a valueof the derived shape metric may be superimposed over the image (e.g. acolor-coded seed center; a color-coded shape; a color coded “filling” ofthe outline of the detected lymphocyte, etc.). In addition, a cellclassification module 206 may be run such that the detected andsegmented lymphocytes may be further characterized, e.g. as cytotoxic Tcells, helper T cells, etc. Also, a scoring module 210 may be run toscore derived image features, e.g. to score a percent positivity, amembrane or nuclear staining intensity (e.g. staining intensity of alymphocyte biomarker), or to provide an H-score.

Of course, any module may be run more than once. For example, the celldetection module 204 and the cell classification module 206 may be run afirst time to detect and classify lymphocytes and then run a second timeto detect and classify tumor cells.

The skilled artisan will also appreciate that additional modules ordatabases not depicted in FIG. 2 may be incorporated into the workflow.For example, an image pre-processing module may be run to apply certainfilters to the acquired images or to identify certain histologicaland/or morphological structures within the tissue samples. In addition,a region of interest selection module may be utilized to select aparticular portion of an image for analysis.

Image Acquisition Module

In some embodiments, as an initial step, and with reference to FIG. 2,the digital pathology system 200 runs an imaging module 202 to captureimages or image data (such as from a scanning device 12) of a biologicalsample having one or more stains (step 301). In some embodiments, theimages received or acquired are RGB images or multispectral images (e.g.multiplex brightfield and/or dark field images). In some embodiments,the images captured are stored in memory 201.

The images or image data (used interchangeably herein) may be acquiredusing the scanning device 12, such as in real-time. In some embodiments,the images are acquired from a microscope or other instrument capable ofcapturing image data of a specimen-bearing microscope slide, as notedherein. In some embodiments, the images are acquired using a 2D scanner,such as one capable of scanning image tiles, or a line scanner capableof scanning the image in a line-by-line manner, such as the VENTANA DP200 scanner. Alternatively, the images may be images that have beenpreviously acquired (e.g. scanned) and stored in a memory 201 (or, forthat matter, retrieved from a server via network 20).

The biological sample may be stained through application of one or morestains, and the resulting image or image data comprises signalscorresponding to each of the one or more stains. Indeed, the biologicalsample may have been stained in a multiplex assay for two or morestains, in addition to or including any counterstains.

As the skilled artisan will appreciate, a biological sample may bestained for different types of nuclei and/or cell membrane biomarkers.Methods for staining tissue structures and guidance in the choice ofstains appropriate for various purposes are discussed, for example, in“Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold SpringHarbor Laboratory Press (1989)” and “Ausubel et al., Current Protocolsin Molecular Biology, Greene Publishing Associates andWiley-Intersciences (1987),” the disclosures of which are incorporatedherein by reference.

In some embodiments, the biological samples are stained for at least twobiomarkers. In other embodiments, the biological samples are stained forthe presence of at least two biomarkers and also stained with a primarystain (e.g. hematoxylin). In some embodiments, the biological samplesare stained for the presence of at least two lymphocyte biomarkers. Inother embodiments, the biological samples are stained for the presenceof at least two lymphocyte biomarkers and for the presence of anadditional biomarker allowing to facilitate further lymphocytedifferentiation. In yet other embodiments, the biological samples arestained for the presence of at least two lymphocyte biomarkers and forthe presence of at least one tumor biomarker. In further embodiments,the biological samples are stained for the presence of at least twolymphocyte biomarkers, an additional biomarker to facilitate lymphocytedifferentiation, and for the presence of at least one tumor biomarker.

In some embodiments, the samples are stained for the presence of atleast a lymphocyte marker. Lymphocyte markers include CD3, CD4, and CD8.In general, CD3 is the “universal marker” for T cells. In someembodiments, further analysis (staining) is performed to identify aspecific type of T cell, e.g. regulatory, helper, or cytotoxic T cell.For example, CD3+ T-cells can be further distinguished as beingcytotoxic T-lymphocytes positive for the CD8 biomarker (CD8 is aspecific marker for cytotoxic T lymphocytes). CD3+ T cells can also bedistinguished as being cytotoxic T-lymphocytes positive for Perforin(Perforin is a membranolytic protein that is expressed in thecytoplasmic granules of cytotoxic T cells and natural killer cells).Cytotoxic T cells are effector cells that actually “kill” tumor cells.They are believed to act by direct contact to introduce the digestiveenzyme granzyme B into the tumor cell cytoplasm, thereby killing it.Similarly, CD3+ T cells can be further distinguished as regulatory Tcells positive for the FOXP3 biomarker. FOXP3 is a nuclear transcriptionfactor that is the most specific marker for regulatory T cells.Likewise, CD3+ T cells may be further distinguished as helper T cellspositive for the CD4 biomarker.

In view of the foregoing, the sample is stained for one or more immunecell markers including at least CD3 or total lymphocytes as detected byhematoxylin & eosin staining. In some embodiments, at least oneadditional T cell specific marker may also be included, such as CD8(marker for cytotoxic T-lymphocytes), CD4 (marker for helperT-lymphocytes), FOXP3 (marker for regulatory T-lymphocytes), CD45RA(marker for naïve T-lymphocytes), and CD45RO (marker for memoryT-lymphocytes). In one specific embodiment, at least two markersincluding human CD3 (or total lymphocytes as detected by H&E staining)and human CD8 are used, in which case a single section of the tumortissue may be labeled with both markers, or serial sections may be used.In other cases, at least one of the immune cell biomarkers islymphocytes identified in a hematoxylin & eosin stained section.

In some embodiments, the samples are stained for the presence of alymphocyte biomarker and a tumor biomarker. For example, in epithelialtumors (carcinomas), cytokeratin staining identifies tumor cells as wellas the normal epithelium. This information, together with the fact thattumor cells abnormally overexpress the cytokeratins compared to normalepithelial cells, allows one to identify tumor versus normal tissue. Formelanoma tissue of neuroectodermal origin, the S100 biomarker serves asimilar purpose.

T-cells, for example CD8-positive cytotoxic T-cells, can be furtherdistinguished by a variety of biomarkers that include PD-1, TIM-3,LAG-3, CD28, and CD57. As such, in some embodiments, T-cells are stainedwith at least one of a variety of lymphocyte biomarkers (e.g., CD3, CD4,CD8, FOXP3) for their identification, and additional biomarkers (LAG-3,TIM-3, PD-L1, etc.) for further differentiation.

In some embodiments, the biological samples are stained for a lymphocytebiomarker and PD-L1. For example, tumor cells can be distinguished asbeing positive for the biomarker PD-L1, which is believed to impact theinteraction of tumor cells and immune cells.

Chromogenic stains may comprise Hematoxylin, Eosin, Fast Red, or3,3′-Diaminobenzidine (DAB). In some embodiments, the tissue sample isstained with a primary stain (e.g. hematoxylin). In some embodiments,the tissue sample is also stained with a secondary stain (e.g. eosin).In some embodiments, the tissue sample is stained in an IHC assay for aparticular biomarker. Of course, the skilled artisan will appreciatethat any biological sample may also be stained with one or morefluorophores.

A typical biological sample is processed in an automated staining/assayplatform that applies a stain to the sample. There are a variety ofcommercial products on the market suitable for use as the staining/assayplatform, one example being the Discovery™ product of Ventana MedicalSystems, Inc. (Tucson, Ariz.). The camera platform may also include abright field microscope, such as the VENTANA iScan HT or the VENTANA DP200 scanners of Ventana Medical Systems, Inc., or any microscope havingone or more objective lenses and a digital imager. Other techniques forcapturing images at different wavelengths may be used. Further cameraplatforms suitable for imaging stained biological specimens are known inthe art and commercially available from companies such as Zeiss, Canon,Applied Spectral Imaging, and others, and such platforms are readilyadaptable for use in the system, methods and apparatus of this subjectdisclosure.

In some embodiments, the input images are masked such that only tissueregions are present in the images. In some embodiments, a tissue regionmask is generated to mask non-tissue regions from tissue regions. Insome embodiments, a tissue region mask may be created by identifying thetissue regions and automatically or semi-automatically (i.e., withminimal user input) excluding the background regions (e.g. regions of awhole slide image corresponding to glass with no sample, such as wherethere exists only white light from the imaging source). The skilledartisan will appreciate that in addition to masking non-tissue regionsfrom tissue regions, the tissue masking module may also mask other areasof interest as needed, such as a portion of a tissue identified asbelonging to a certain tissue type or belonging to a suspected tumorregion. In some embodiments, a segmentation technique is used togenerate the tissue region masked images by masking tissue regions fromnon-tissue regions in the input images. Suitable segmentation techniquesare as such known from the prior art, (cf. Digital Image Processing,Third Edition, Rafael C. Gonzalez, Richard E. Woods, chapter 10, page689 and Handbook of Medical Imaging, Processing and Analysis, Isaac N.Bankman Academic Press, 2000, chapter 2). In some embodiments, an imagesegmentation technique is utilized to distinguish between the digitizedtissue data and the slide in the image, the tissue corresponding to theforeground and the slide corresponding to the background. In someembodiments, the component computes the Area of Interest (AOI) in awhole slide image in order to detect all tissue regions in the AOI whilelimiting the amount of background non-tissue area that is analyzed. Awide range of image segmentation techniques (e.g., HSV color-based imagesegmentation, Lab image segmentation, mean-shift color imagesegmentation, region growing, level set methods, fast marching methods,etc.) can be used to determine, for example, boundaries of the tissuedata and non-tissue or background data. Based at least in part on thesegmentation, the component can also generate a tissue foreground maskthat can be used to identify those portions of the digitized slide datathat correspond to the tissue data. Alternatively, the component cangenerate a background mask used to identify those portions of thedigitized slide date that do not correspond to the tissue data.

This identification may be enabled by image analysis operations such asedge detection, etc. A tissue region mask may be used to remove thenon-tissue background noise in the image, for example the non-tissueregions. In some embodiments, the generation of the tissue region maskcomprises one or more of the following operations (but not limited tothe following operations): computing the luminance of the low resolutionanalysis input image, producing a luminance image, applying a standarddeviation filter to the luminance image, producing a filtered luminanceimage, and applying a threshold to filtered luminance image, such thatpixels with a luminance above a given threshold are set to one, andpixels below the threshold are set to zero, producing the tissue regionmask. Additional information and examples relating to the generation oftissue region masks is disclosed in US Publication No. 2017/0154420,entitled “An Image Processing Method and System for Analyzing aMulti-Channel Image Obtained from a Biological Tissue Sample BeingStained by Multiple Stains,” the disclosure of which is herebyincorporated by reference herein in its entirety.

In some embodiments, a region of interest identification module may beused to select a portion of the biological sample for which image datashould be acquired, e.g. a region of interest having a largeconcentration of lymphocytes cells or a region suspected of having alarge concentration of lymphocytes. Methods of determining a region ofinterest are described in US Publication No. 2017/0154420, thedisclosure of which is hereby incorporated by reference herein in itsentirety. In general, the US Publication No. 2017/0154420 discloses: animage processing method for analyzing a multi-channel image obtainedfrom a biological tissue sample being stained by multiple stains, themethod comprising: a. unmixing the multi-channel image to provide oneunmixed image per channel, b. spatial low pass filtering of at least oneof the unmixed images, c. local maximum filtering of the at least one ofthe spatial low pass filtered unmixed images, d. thresholding the atleast one of the spatial low pass filtered unmixed images to identify atleast one set of neighboring pixels, and e. defining a region ofinterest by extracting an image portion of the multi-channel image froman image location given by the set of neighboring pixels, the region ofinterest having a predetermined size and shape.

Unmixing Module

In some embodiments, the images received as input may be multipleximages, i.e. the image received is of a biological sample stained withmore than one stain (e.g. an image stained for the presence of the CD3,CD8, and PD-L1 biomarkers). In these embodiments, and prior to furtherprocessing, the multiple image is first unmixed into its constituentchannels, such as with an unmixing module 203, where each unmixedchannel corresponds to a particular stain or signal (e.g. in the aboveexample, a CD3 “stain” image change, a CD8 “stain” image channel, and aPD-L1 “stain” image channel). In some embodiments, the unmixed images(often referred to as “channel images” or “image channel images”) andmay be used as the input for each module described herein.

In some embodiments, in a sample comprising one or more stains,individual images may be produced for each channel of the one or morestains. Without wishing to be bound by any particular theory, it isbelieved that these channels highlight different tissue structures inthe tissue image, thus, they may be referred to as structural imagechannels. For example, for a sample stained with hematoxylin and for thepresence of the CD3 (e.g. stained red using Fast Red) and Perforin (e.g.stained brown using DAB), unmixing would provide at least a hematoxylinimage channel image, a CD3/red image change image, and a Perforin/brownimage channel image. The skilled artisan will appreciate that featuresextracted from these channels are useful in describing the differentbiological structures present within any image of a tissue (e.g. nuclei,membranes, cytoplasm, etc.).

The multi-spectral image provided by the imaging module 202 is aweighted mixture of the underlying spectral signals associated theindividual biomarkers and noise components. At any particular pixel, themixing weights are proportional to the biomarker expressions of theunderlying co-localized biomarkers at the particular location in thetissue and the background noise at that location. Thus, the mixingweights vary from pixel to pixel. The spectral unmixing methodsdisclosed herein decompose the multi-channel pixel value vector at eachand every pixel into a collection of constituent biomarker end membersor components and estimate the proportions of the individual constituentstains for each of the biomarkers.

Unmixing is the procedure by which the measured spectrum of a mixedpixel is decomposed into a collection of constituent spectra, orendmembers, and a set of corresponding fractions, or abundances, thatindicate the proportion of each endmember present in the pixel.Specifically, the unmixing process can extract stain-specific channelsto determine local concentrations of individual stains using referencespectra that are well known for standard types of tissue and staincombinations. The unmixing may use reference spectra retrieved from acontrol image or estimated from the image under observation. Unmixingthe component signals of each input pixel enables retrieval and analysisof stain-specific channels, such as a hematoxylin channel and an eosinchannel in H&E images, or a diaminobenzidine (DAB) channel and acounterstain (e.g., hematoxylin) channel in IHC images. The terms“unmixing” and “color deconvolution” (or “deconvolution”) or the like(e.g. “deconvolving,” “unmixed”) are used interchangeably in the art.

In some embodiments, the multiplex images are unmixed with unmixingmodule 205 using linear unmixing. Linear unmixing is described, forexample, in ‘Zimmermann “Spectral Imaging and Linear Unmixing in LightMicroscopy” Adv Biochem Engin/Biotechnology (2005) 95:245-265’ and in inC. L. Lawson and R. J. Hanson, “Solving least squares Problems,”PrenticeHall, 1974, Chapter 23, p. 161,’ the disclosures of which areincorporated herein by reference in their entirety. In linear stainunmixing, the measured spectrum (S(λ)) at any pixel is considered alinear mixture of stain spectral components and equals the sum of theproportions or weights (A) of each individual stain's color reference(R(λ)) that is being expressed at the pixel

S(λ)=A ₁ ·R ₁(λ)+A ₂ ·R ₂(λ)+A ₃ ·R ₃(λ) . . . A _(i) ry(λ)

which can be more generally expressed as in matrix form as

S(λ)=ΣA _(i) ry(λ) or S=R·A

If there are M channels images acquired and N individual stains, thecolumns of the M×N matrix R are the optimal color system as derivedherein, the N×1 vector A is the unknown of the proportions of individualstains and the M×1 vector S is the measured multichannel spectral vectorat a pixel. In these equations, the signal in each pixel (S) is measuredduring acquisition of the multiplex image and the reference spectra,i.e. the optimal color system, is derived as described herein. Thecontributions of various stains (A_(i)) can be determined by calculatingtheir contribution to each point in the measured spectrum. In someembodiments, the solution is obtained using an inverse least squaresfitting approach that minimizes the square difference between themeasured and calculated spectra by solving the following set ofequations,

[∂Σ_(j) {S(λ_(j))−A _(i) ry(λ_(j))}2]/∂A _(i)=0

In this equation, j represents the number of detection channels and iequals the number of stains. The linear equation solution often involvesallowing a constrained unmixing to force the weights (A) to sum tounity.

In other embodiments, unmixing is accomplished using the methodsdescribed in WO2014/195193, entitled “Image Adaptive PhysiologicallyPlausible Color Separation,” filed on May 28, 2014, the disclosure ofwhich is hereby incorporated by reference in its entirety herein. Ingeneral, WO2014/195193 describes a method of unmixing by separatingcomponent signals of the input image using iteratively optimizedreference vectors. In some embodiments, image data from an assay iscorrelated with expected or ideal results specific to thecharacteristics of the assay to determine a quality metric. In the caseof low quality images or poor correlations against ideal results, one ormore reference column vectors in matrix R are adjusted, and the unmixingis repeated iteratively using adjusted reference vectors, until thecorrelation shows a good quality image that matches physiological andanatomical requirements. The anatomical, physiological, and assayinformation may be used to define rules that are applied to the measuredimage data to determine the quality metric. This information includeshow the tissue was stained, what structures within the tissue wereintended or not intended to be stained, and relationships betweenstructures, stains, and markers specific to the assay being processed.An iterative process results in stain-specific vectors that can generateimages that accurately identify structures of interest and biologicallyrelevant information, are free from any noisy or unwanted spectra, andtherefore fit for analysis. The reference vectors are adjusted to withina search space. The search space defines a range of values that areference vector can take to represent a stain. The search space may bedetermined by scanning a variety of representative training assaysincluding known or commonly occurring problems and determininghigh-quality sets of reference vectors for the training assays.

In other embodiments, unmixing is accomplished using the methodsdescribed in WO2015/124772, entitled “Group Sparsity Model for ImageUnmixing,” filed on Feb. 23, 2015, the disclosure of which is herebyincorporated by reference in its entirety herein. In general,WO2015/124772 describes unmixing using a group sparsity framework, inwhich fractions of stain contributions from a plurality of colocationmarkers are modeled within a “same group” and fractions of staincontributions from a plurality of non-colocation markers are modeled indifferent groups, providing co-localization information of the pluralityof colocation markers to the modeled group sparsity framework, solvingthe modeled framework using a group lasso to yield a least squaressolution within each group, wherein the least squares solutioncorresponds to the unmixing of the colocation markers, and yielding asparse solution among the groups that corresponds to the unmixing of thenon-colocation markers. Moreover, WO2015124772 describes a method ofunmixing by inputting image data obtained from the biological tissuesample, reading reference data from an electronic memory, the referencedata being descriptive of the stain color of each one of the multiplestains, reading colocation data from electronic memory, the colocationdata being descriptive of groups of the stains, each group comprisingstains that can be collocated in the biological tissue sample, and eachgroup forming a group for the group lasso criterion, at least one of thegroups having a size of two or above, and calculating a solution of thegroup lasso criterion for obtaining the unmixed image using thereference data as a reference matrix. In some embodiments, the methodfor unmixing an image may comprise generating a group sparsity modelwherein a fraction of a stain contribution from colocalized markers isassigned within a single group and a fraction of a stain contributionfrom non-colocalized markers is assigned within separate groups andsolving the group sparsity model using an unmixing algorithm to yield aleast squares solution within each group.

Cell Detection and Cell Classification Modules

Following image acquisition and/or unmixing, input images or unmixedimage channel images are provided to a cell detection module 204 todetect cells and (optionally) subsequently to a cell classificationmodule 206. In some embodiments, the cell detection module 204 isutilized to detect lymphocytes (step 302) within the image based onfeatures within the stained biological sample as noted herein. Theprocedures and algorithms described herein may be adapted to identifyand classify various types of cells or cell nuclei, not justlymphocytes, based on features within the input images, includingidentifying and classifying tumor cells, non-tumor cells, stroma cells,and non-target stain, etc. The skilled artisan will also appreciate thatalthough lymphocyte detection may occur initially, tumor cells or othertypes of cells may also be detected either simultaneously orsequentially.

General Methods

In some embodiments, one or more features or metrics (examples areenumerated herein) are derived by detecting nuclei within the inputimage and/or by extracting features from the detected nuclei and/or fromcell membranes (depending, of course, on the biomarker(s) utilizedwithin the input image). In other embodiments, metrics are derived byanalyzing cell membrane staining, cell cytoplasm staining, and/orpunctuate staining (e.g. to distinguish between membrane-staining areasand non-membrane staining areas). As used herein, the term “cytoplasmicstaining” refers to a group of pixels arranged in a pattern bearing themorphological characteristics of a cytoplasmic region of a cell. As usedherein, the term “membrane staining” refers to a group of pixelsarranged in a pattern bearing the morphological characteristics of acell membrane. As used herein, the term “punctate staining” refers to agroup of pixels with strong localized intensity of staining appearing asspots/dots scattering on the membrane area of the cell. The skilledartisan will appreciate that the nucleus, cytoplasm, and membrane of acell have different characteristics and that differently stained tissuesamples may reveal different biological features. Indeed, the skilledartisan will appreciate that certain cell surface receptors can havestaining patterns localized to the membrane or localized to thecytoplasm. Thus, a “membrane” staining pattern may be analyticallydistinct from a “cytoplasmic” staining pattern. Likewise, a“cytoplasmic” staining pattern and a “nuclear” staining pattern may beanalytically distinct.

In some embodiments, the images received as input are processed such asto detect nucleus centers (seeds) and/or to segment the nuclei. Forexample, instructions may be provided to detect nucleus centers based onradial-symmetry voting using techniques commonly known to those ofordinary skill in the art (see Parvin, Bahram, et al. “Iterative votingfor inference of structural saliency and characterization of subcellularevents.” Image Processing, IEEE Transactions on 16.3 (2007): 615-623,the disclosure of which is incorporated by reference in its entiretyherein). In some embodiments, nuclei are detected using radial symmetryto detect centers of nuclei and then the nuclei are classified based onthe intensity of stains around the cell centers. For example, an imagemagnitude may be computed within an image and one or more votes at eachpixel are accumulated by adding the summation of the magnitude within aselected region. Mean shift clustering may be used to find the localcenters in the region, with the local centers representing actualnuclear locations.

Nuclei detection based on radial symmetry voting is executed on colorimage intensity data and makes explicit use of the a priori domainknowledge that the nuclei are elliptical shaped blobs with varying sizesand eccentricities. To accomplish this, along with color intensities inthe input image, image gradient information is also used in radialsymmetry voting and combined with an adaptive segmentation process toprecisely detect and localize the cell nuclei. A “gradient” as usedherein is, for example, the intensity gradient of pixels calculated fora particular pixel by taking into consideration an intensity valuegradient of a set of pixels surrounding said particular pixel. Eachgradient may have a particular “orientation” relative to a coordinatesystem whose x- and y-axis are defined by two orthogonal edges of thedigital image. For instance, nuclei seed detection involves defining aseed as a point which is assumed to lie inside a cell nucleus and serveas the starting point for localizing the cell nuclei. The first step isto detect seed points associated with each cell nuclei using a highlyrobust approach based on the radial symmetry to detect elliptical-shapedblobs, structures resembling cell nuclei. The radial symmetry approachoperates on the gradient image using a kernel-based voting procedure. Avoting response matrix is created by processing each pixel thataccumulates a vote through a voting kernel. The kernel is based on thegradient direction computed at that particular pixel and an expectedrange of minimum and maximum nucleus size and a voting kernel angle(typically in the range [π/4, π/8]). In the resulting voting space,local maxima locations that have a vote value higher than a predefinedthreshold value are saved out as seed points. Extraneous seeds may bediscarded later during subsequent segmentation or classificationprocesses. An example of a radial symmetry-based nuclei detectionoperation is described within WO/2014/140085A1, the disclosure of whichis incorporated herein by reference in its entirety. Other such methodsare discussed in US Patent Publication No. 2017/0140246, the disclosureof which is again incorporated by reference herein.

In some embodiments, once the seeds are detected, a locally adaptivethresholding method may be used, and blobs around the detected centersare created. In some embodiments, other methods may also beincorporated, such as marker-based watershed algorithms can also be usedto identify the nuclei blobs around the detected nuclei centers. Theseand other methods are described in co-pending applicationPCT/EP2016/051906, published as WO2016/120442, the disclosure of whichis incorporated by reference herein in its entirety. A “blob” as usedherein can be, for example, a region of a digital image in which someproperties, e.g. the intensity or grey value, are constant or varywithin a prescribed range of values. All pixels in a blob can beconsidered in some sense to be similar to each other. For example, blobsmay be identified using differential methods which are based onderivatives of a function of position on the digital image, and methodsbased on local extrema. A nuclear blob is a blob whose pixels and/orwhose outline shape indicate that the blob was probably generated by anucleus stained with the first stain. For example, the radial symmetryof a blob could be evaluated to determine if the blob should beidentified as a nuclear blob or as any other structure, e.g. a stainingartifact. For example, in case a blob has a lengthy shape and is notradially symmetric, said blob may not be identified as a nuclear blobbut rather as a staining artifact. Depending on the embodiment, a blobidentified to be a “nuclear blob” may represent a set of pixels whichare identified as candidate nuclei, and which may be further analyzedfor determining if said nuclear blob represents a nucleus. In someembodiments, any kind of nuclear blob is directly used as an “identifiednucleus.”

In some embodiments, filtering operations are applied on the identifiednuclei or nuclear blobs for identifying nuclei which do not belong tobiomarker-positive tumor cells and for removing said identifiednon-tumor nuclei from the list of already identified nuclei or notadding said nuclei to the list of identified nuclei from the beginning.Of course, filtering may be performed to retain non-tumor nuclei (e.g.lymphocytes) and exclude tumor nuclei such that non-tumor nuclei (e.g.lymphocytes) may be detected and/or classified. By way of example,additional spectral and/or shape features of the identified nuclear blobmay be analyzed to determine if the nucleus or nuclear blob is a nucleusof a tumor cell or not. For example, the nucleus of a lymphocyte islarger than the nucleus of other tissue cell, e.g. of a lung cell. Incase the tumor cells are derived from a lung tissue, nuclei oflymphocytes are identified by identifying all nuclear blobs of a minimumsize or diameter which is significantly larger than the average size ordiameter of a normal lung cell nucleus. The identified nuclear blobsrelating to the nuclei of lymphocytes may be removed (i.e., “filteredout from”) the set of already identified nuclei. By filtering out thenuclei of non-tumor cells, the accuracy of the method may be increased.Depending on the biomarker, also non-tumor cells may express thebiomarker to a certain extent and may therefore produce an intensitysignal in the first digital image which does not stem from a tumor cell.By identifying and filtering out nuclei which do not belong to tumorcells from the totality of the already identified nuclei, the accuracyof identifying biomarker-positive tumor cells may be increased. Theseand other methods are described in US Patent Publication 2017/0103521,the disclosure of which is incorporated by reference herein in itsentirety.

In some embodiments, the detected nuclei (such as those detected usingradial symmetry voting) are then subsequently segmented using thresholdsindividually computed for each nucleus. For example, Otsu's method maybe used for segmentation in a region around an identified nucleus sinceit is believed that the pixel intensity in the nuclear regions varies.As will be appreciated by those of ordinary skill in the art, Otsu'smethod is used to determine an optimal threshold by minimizing theintra-class variance and is known to those of skill in the art. Morespecifically, Otsu's method is used to automatically performclustering-based image thresholding or, the reduction of a gray levelimage to a binary image. The algorithm assumes that the image containstwo classes of pixels following a bi-modal histogram (foreground pixelsand background pixels). It then calculates the optimum thresholdseparating the two classes such that their combined spread (intra-classvariance) is minimal, or equivalent (because the sum of pairwise squareddistances is constant), so that their inter-class variance is maximal.

Lymphocyte Specific Methods

Methods of detecting lymphocytes and tumor infiltrating lymphocytes aredescribed by Basavanhally et al. and Fatakdawala, H. et al. InBasavanhally et al., “Computerized image-based detection and grading oflymphocytic infiltration in HER2+ breast cancer histopathology,” IEEETransactions on Biomedical Engineering 57, 642-653 (2010) (thedisclosure of which is hereby incorporated by reference herein in itsentirety) region growing with high sensitivity and low specificity isused to initially segment lymphocyte nuclei and other objects. Then,maximum a posteriori (MAP) estimation that incorporates size, luminanceand spatial proximity information is used to improve the specificity ofthe detector. Finally, the results from the lymphocyte nuclei detectionare input to a classifier that discriminates between the lymphocyteinfiltration phenomenon and the baseline level of lymphocytes.

In Fatakdawala, H. et al., “Expectation-maximization-driven geodesicactive contour with overlap resolution (EMaGACOR): application tolymphocyte segmentation on breast cancer histopathology,” IEEETransactions on Biomedical Engineering 57, 1676-1689 (2010) (thedisclosure of which is incorporated by reference herein in its entirety)output from a Gaussian mixture clustering algorithm is used toinitialize geodesic active contour segmentation. The overlapping objectsare resolved by splitting them along high concavity points. Lymphocytenuclei are distinguished from other objects by texture-based clustering.More specifically, the methods of Fatakdawala utilize RGB values tocreate a Gaussian mixture model (GMM), wherein an input image ispartitioned into four types of regions corresponding to the structuresof breast cancer nuclei, lymphocyte nuclei, stroma, and background,respectively. The final membership of each pixel is determined as theone corresponding to the maximum posterior probability. The procedure oflymphocyte nucleus segmentation can be summarized as: 1) segment theimage into the four categories of regions using a parametricexpectation-maximization algorithm, 2) extract component boundariesusing the magnetostatic active contour model [234] with EM-generatedsegmentation as initialization, 3) split touching nuclei with a concavepoint detection based shortest path searching algorithm, and 4)discriminate lymphocytes from the others by using K-means clusteringwith the first-order statistical texture features calculated from thesegmented nuclei.

Other methods of detecting lymphocytes are described in U.S. PatentApplication Publication No. 2016/0363593, entitled “Methods, Kits, andSystems for Scoring the Immune Response to Cancer,” the disclosure ofwhich is hereby incorporated by reference herein in its entirety.

Classification

In some embodiments, after candidate nuclei are identified, they may befurther analyzed to distinguish tumor nuclei from other candidate nuclei(e.g. lymphocyte nuclei and stroma nuclei). In some embodiments, alearnt supervised classifier may be trained to distinguish betweendifferent classes of non-tumor nuclei. In some embodiments, the learntsupervised classifier is a Support Vector Machine (“SVM”). In general, aSVM is a classification technique, which is based on statisticallearning theory where a nonlinear input data set is converted into ahigh dimensional linear feature space via kernels for the non-linearcase. A support vector machines project a set of training data, E, thatrepresents two different classes into a high-dimensional space by meansof a kernel function, K. In this transformed data space, nonlinear dataare transformed so that a flat line can be generated (a discriminatinghyperplane) to separate the classes so as to maximize the classseparation. Testing data are then projected into the high-dimensionalspace via K, and the test data (such as the features or metricsenumerated below) are classified on the basis of where they fall withrespect to the hyperplane. The kernel function K defines the method inwhich data are projected into the high-dimensional space.

In some embodiments, the learnt supervised classifier used to identifytumor nuclei and non-tumor nuclei is a random forest classifier. Forexample, the random forest classifier may be trained by: (i) creating atraining set of tumor and non-tumor nuclei, (ii) extracting features foreach nucleus, and (iii) training the random forest classifier todistinguish between tumor nuclei and non-tumor nuclei based on theextracted features (such as those features enumerated herein). Thetrained random forest classifier may then be applied to classify thenuclei in a test image into tumor nuclei and non-tumor nuclei.Optionally, the random forest classifier may be further trained todistinguish between different classes of non-tumor nuclei, such aslymphocyte nuclei and stromal nuclei (and even between different typesof lymphocytes).

Features or metrics which may be derived from input images areenumerated below. After the features are derived, they may be used aloneor in conjunction with training data (e.g. during training, examplecells are presented together with a ground truth identification providedby an expert observer according to procedures known to those of ordinaryskill in the art) to classify nuclei or cells (tumor cells, lymphocytes,etc.).

(A) Metrics Derived from Morphology Features

A “morphology feature” as used herein is, for example, a feature beingindicative of the shape or dimensions of a nucleus. Morphologicalfeatures provide some information about the size and shape of a cell orits nucleus. For example, a morphology feature may be computed byapplying various image analysis algorithms on pixels contained in orsurrounding a nuclear blob or seed. In some embodiments, the morphologyfeatures include area, minor, and major axis lengths, perimeter, radius,solidity, etc.

(B) Metrics Derived from Appearance Features

An “appearance feature” as used herein is, for example, a feature havingbeen computed for a particular nucleus by comparing pixel intensityvalues of pixels contained in or surrounding a nuclear blob or seed usedfor identifying the nucleus, whereby the compared pixel intensities arederived from different image channels (e.g. a background channel, achannel for the staining of a biomarker, etc.). In some embodiments, themetrics derived from appearance features are computed from percentilevalues (e.g. the 10th, 50th, and 95th percentile values) of pixelintensities and of gradient magnitudes computed from different imagechannels. For example, at first, a number P of X-percentile values(X=10, 50, 95) of pixel values of each of a plurality IC of imagechannels (e.g. three channels: HTX, DAB, luminance) within a nuclearblob representing the nucleus of interest are identified. Computingappearance feature metrics may be advantageous since the derived metricsmay describe the properties of the nuclear regions as well as describethe membrane region around the nuclei.

(C) Metrics Derived from Background Features

A “background feature” is, for example, a feature being indicative ofthe appearance and/or stain presence in cytoplasm and cell membranefeatures of the cell comprising the nucleus for which the backgroundfeature was extracted from the image. A background feature and acorresponding metrics can be computed for a nucleus and a correspondingcell depicted in a digital image e.g. by identifying a nuclear blob orseed representing the nucleus; analyzing a pixel area (e.g. a ribbon of20 pixels—about 9 microns—thickness around the nuclear blob boundary)directly adjacent to the identified set of cells are computed in,therefore capturing appearance and stain presence in cytoplasm andmembrane of the cell with this nucleus together with areas directlyadjacent to the cell. These metrics are similar to the nuclearappearance features but are computed in a ribbon of about 20 pixels(about 9 microns) thickness around each nucleus boundary, thereforecapturing the appearance and stain presence in the cytoplasm andmembrane of the cell having the identified nucleus together with areasdirectly adjacent to the cell. It is believed that the ribbon size isselected because it is believed that it captures a sufficient amount ofbackground tissue area around the nuclei that can be used to provideuseful information for nuclei discrimination. These features are similarto those disclosed by “J. Kong, et al., “A comprehensive framework forclassification of nuclei in digital microscopy imaging: An applicationto diffuse gliomas,” in ISBI, 2011, pp. 2128-2131″ the disclosure ofwhich is incorporated by reference in its entirety herein. It isbelieved that these features may be used to determine whether thesurrounding tissue is stroma or epithelium (such as in H&E stainedtissue samples). It is believed that these background features alsocapture membrane staining patterns, which are useful when the tissuesamples are stained with appropriate membrane staining agents.

(D) Metrics Derived from Color.

In some embodiments, metrics derived from color include color ratios,R/(R+G+B). or color principal components. In other embodiments, metricsderived from color include local statistics of each of the colors(mean/median/variance/std dev) and/or color intensity correlations in alocal image window.

(E) Metrics Derived from Intensity Features

The group of adjacent cells with certain specific property values is setup between the dark and the white shades of grey colored cellsrepresented in a histopathological slide image. The correlation of thecolor feature defines an instance of the size class, thus this way theintensity of these colored cells determines the affected cell from itssurrounding cluster of dark cells. Examples of texture features aredescribed in PCT Publication No. WO/2016/075095, the disclosure of whichis incorporated by reference herein in its entirety.

(F) Spatial Features

In some embodiments, spatial features include a local density of cells;average distance between two adjacent detected cells; and/or distancefrom a cell to a segmented region.

(G) Metrics Derived from Nuclear Features

The skilled artisan will also appreciate that metrics may also bederived from nuclear features. The computation of such nuclear featuresis described by Xing et al. “Robust Nucleus/Cell Detection andSegmentation in Digital Pathology and Microscopy Images: A ComprehensiveReview,” IEEE Rev Biomed Eng 9, 234-263, January 2016, the disclosure ofwhich is hereby incorporated by reference herein in its entirety. Ofcourse, other features, as known to those of ordinary skill in the art,may be considered and used as the basis for computation of features.

Segmentation Module

After the lymphocytes are detected (step 302) and optionally classified,a foreground segmentation mask may be computed (step 303) usingsegmentation module 205 such that only the identified lymphocytes arevisualized. In some embodiments, the foreground segmentation mask isgenerated using the methods described in United States PatentApplication Publication No. 2017/0337596. In particular, US 2017/0337596describes computing a foreground segmentation by (1) applying filters toenhance the image such that (a) image regions unlikely to have cells arediscarded, and (b) cells within a local region are identified; and (2)further applying optional filters to selectively remove artifacts,remove small blobs, remove discontinuities, fill holes, and split upbigger blobs. In some embodiments, the filters applied are selected fromthe group consisting of a global thresholding filter, a locally adaptivethresholding filter, morphological operation filters, and watershedtransformation filters. In some embodiments, the global thresholdingfilter is applied first, followed by application of the locally adaptivethresholding filter. In some embodiments, the optional filters toselectively remove artifacts, remove small blobs, removediscontinuities, fill holes, and split up bigger blobs are applied afterapplication of the locally adaptive thresholding filter. In someembodiments, the identification of the individual nuclei furthercomprises performing a connected-components labeling process on thefiltered input image.

In some embodiments, the foreground segmentation mask, when applied tothe original image or an unmixed image channel image, allows for thevisualization and identification of detected lymphocytes (and that othercell types may be excluded from the mask). In some embodiments, thelymphocytes may be visualized as an outline. The skilled artisan willappreciate that by having an outline or trace of all identifiedlymphocytes, that an area of each lymphocyte may be calculated.

Shape Metric Determination Module

Following the detection and segmentation of lymphocytes using the imageanalysis module (steps 302 and 303), a shape metric is derived (step304) for each identified lymphocyte using the shape metric derivationmodule 207. Essentially, the shape metric is computed such that itprovides meaningful data pertaining to the shape of the detected andsegmented lymphocytes. In some embodiments, the shape metric is derivedsuch that it provides meaningful data pertaining to an elongate shape ofa cell (an “elongateness”). Various shape metrics may be derived thatmay be used as a surrogate for cell shape (and hence motility). In someembodiments, the shape metrics are selected from the group consisting ofa minor axis/major axis aspect ratio, an eccentricity parameter, acircularity parameter, a roundness parameter, or a solidity parameter.

In some embodiments, the shape metric is a minor axis/major axis aspectratio of a best fit ellipse. Initially, and with reference to FIG. 4, anellipse is fitted to each lymphocyte, e.g. by fitting an ellipse withinan outline provided by a foreground segmentation mask (step 401).Methods of fitting an ellipse to a lymphocyte or an outline aredescribed below. From a best fit ellipse, the length measurements of amajor axis and the length measurement of a minor axis may be measured(step 402). The major axis is the longest diameter of a best fitellipse. On the other hand, a minor axis is the shortest diameter of thesame ellipse. In this way, the length of the major and minor axes of thefitted ellipse is a robust measure to obtain an estimate of the largestand shortest extents of a cell's shape. An aspect ratio between the twolength measurements may then be ascertained (step 403). For example, theaspect ratio can be characterized by the following equation: “minor axislength”/“major axis length”

The ratio of the minor axis length to the major axis length is 1.0 for asymmetric object like a disk. The skilled artisan will appreciate thatthe value of the aspect ratio becomes smaller and approaches zero formore elongated objects. In short, the minor axis length/major axislength aspect ratio may serve as a robust metric for the shape of thecell and serve as a surrogate for cell motility, e.g. an aspect ratioapproaching one may be indicative of a lymphocyte that is not motile,while an aspect ratio approaching zero may be indicative of a lymphocytethat is relatively dynamic and thus more motile or capable of being moremotile.

In other embodiments, the shape metric is an eccentricity value of abest fit ellipse. Like the minor axis/major axis aspect ratio, theeccentricity value feature gives an idea about the degree of deviationof the image from being circular. The linear eccentricity of an ellipseor hyperbola is the distance between its center and either of its twofoci. Eccentricity is derived as follows:

eccentricity={(a ² −b ²)^(1/2) }/a,

where a and b are the major axis length and minor axis length of theequivalent best fit ellipses.

The eccentricity parameter may vary between 0 and 1. It equals zero whenb equals a, that is when the ellipse is a circle. As the ellipse movesaway from the circular shape and becomes flatter, ‘a’ assumesincreasingly larger values with respect to the value of ‘b,’ and as suchthe fraction b/a decreases towards the 0 value, and the eccentricityvalue approaches 1.

In some embodiments, the shape metric is a circularity parameter.Circularity is a shape descriptor that can mathematically indicate thedegree of similarity to a perfect circle. A value of 1.0 designates aperfect circle. As the circularity value approaches 0.0, the shape isincreasingly less circular. Circularity can be defined by the equation:

${4\pi*\frac{\lbrack{Area}\rbrack}{\lbrack{Perimeter}\rbrack^{2}}},$

where the area of the objects in binary image is a scalar whose valuecorresponds roughly to the total number of non-zero pixels in the image.A Hugh transform may be utilized to fit circles to an image. A circle isdescribed by three parameters: radius r and the two center coordinates,x_(c) and y_(c): r²=(x−x_(c))²+(y−y_(c))². Therefore, the Hough spacefor circles is three-dimensional. The projection of an individual pixel(x_(k),y_(k)) into Hough space is a cone: If the pixel coordinates x andy as in the above equation are assumed to be fixed with x=x_(k) andy=y_(k), the above equation is satisfied by an infinite set ofconcentric circles with center (x_(k),y_(k)). The radius of the circlesincreases as the circles are translated along the r-axis. If a number ofpixels that lie on a circle in image space are projected into Houghspace, the ensuing cones meet at the point (x_(c),y_(c),r). Accumulatingvotes along the cones (analogous to the sinusoidal traces in the linetransform) will therefore yield a maximum at (x_(c),y_(c),r) that can beused to reconstruct the circle.

In other embodiments, the shape metric is a roundness parameter.Roundness is similar to circularity but is insensitive to irregularborders along the perimeter of an object. Roundness also takes intoconsideration the major axis of the best fit ellipse. Roundness can bedefined by the equation:

$4\pi*\frac{\lbrack{Area}\rbrack}{\pi*\lbrack {{Major}\mspace{14mu} {axis}} \rbrack^{2}}$

In some embodiments, a Hough transform, or a randomized Hough transformmay be used to derive a best fit ellipse, such as a best fit ellipse foreach identified lymphocyte. As used herein, the term “ellipse shape” notonly refers to an ellipse in the mathematical sense, but also includesthe concept of a shape slightly deformed from an ellipse (such as anoval shape). The basic idea of Hough transform is to implement a votingprocedure for all potential curves in an image, and at the terminationof the algorithm, curves that do exist in the image will have relativelyhigh voting scores. Said another way, the principal idea of the Houghtransform is the accumulation of votes in parameter space, and the HoughTransform can be applied to both circles and ellipses, since both shapescan be described analytically. The Hough transform includes of threesteps: (i) a pixel in the image is transformed into a parameterizedcurve; (ii) a valid curve's parameters are binned into an accumulatorwhere the number of curves in a bin equals its score; and (iii) a curvewith a maximum score is selected from the accumulator to represent acurve in the image. Methods of ellipse detection in an image aredescribed by Yuen et. al., “Ellipse Detection Using the HoughTransform,” AV 1998 doi: 10.5244/C.2.41, the disclosure of which isincorporated by reference herein in its entirety. Additional methods ofperforming a Hough transform are described in United States PatentPublication No. 2016/0196465, the disclosure of which is incorporated byreference herein in its entirety.

Randomized Hough transform is different from a “traditional” Houghtransform in that it tries to avoid conducting the computationallyexpensive voting process for every nonzero pixel in the image by takingadvantage of the geometric properties of analytical curves, and thusimprove the time efficiency and reduce the storage requirement of theoriginal algorithm. The Randomized Hough transform process generallyconsists of three steps: (i) fit ellipses with randomly selected points;(ii) update the accumulator array and corresponding scores; and (iii)output the ellipses with scores higher than some predefined threshold.

More specifically, Randomized Hough transform randomly selects n pixelsfrom an image and fits them to a parameterized curve. If the pixels fitwithin a tolerance they are added to an accumulator with a score. Once aspecified number of pixel sets are selected, the curves with the bestscore are selected from the accumulator and its parameters are used torepresent a curve in the image. Because only a small random subset ofpixels, n, are selected this method reduces the storage requirements andcomputational time needed to detect curves in an image. In a RandomizedHough transform, if a curve in the accumulator is similar to the curvesbeing tested, the parameters of the curves are averaged together, andthe new average curve replaces the curve in the accumulator. Thisreduces the difficulty of finding the local maxima in the Hough spacebecause only one point in the Hough space represents a curve, instead ofa clump of near points with a local maxima.

By way of example, an ellipse's parameters may be determined from animage starting from finding the center coordinates of the ellipse todetermining the semi-major axis' length (a), semi-minor axis' length(b), and half the distance between the foci (c).

Step 1: Select three points, X₁, X₂, and X₃. Three points are randomlyselected from the image such that each point has an equal opportunity tobe chosen. Three sets of iterations of random numbers are generated from1 to the length of the image in sub-indices to form sets of three pointsfor each iteration. A sub-index is the number of a cell in a matrix andranges from 1 to the number of cells in the matrix. This is analternative form for specifying a matrix cell from the normal row,column form. Only unique random numbers generated for sub-indices arekept to better cover the image, because each iteration requires threerandom points. If, after throwing away duplicate points, there are notenough points for all iterations specified, random numbers are generateduntil there were enough. All numbers are kept from this secondgeneration, even if they duplicate the first sets.

Step 2: Determine the equation of the line for each point where theline's slope is the gradient at the point: y=mx+b. This is done bychecking the pixels around the point and performing a least squares linefit to them. By way of example, determining the point's line equationcan be performed sing MATLAB ‘Roipoly’ to select points in a seven byseven region around the point of interest. From the coordinates of thesepoints we use the ‘polyfit’ to find the slope m₁ and y-intercept b₁ forthe point of interest.

Step 3: Determine the intersection of the tangents passing through pointpairs (X₁,X₂) and (X₂,X₃). The tangent intersection points t₁₂ and t₂₃are found by solving these systems of linear equations for the x and ycoordinates:

Tangents X₁ and X₂ for t₁₂:

$\quad\begin{bmatrix}{{{m_{1}x} + b_{1} - y} = 0} \\{{{m_{2}x} + b_{2} - y} = 0}\end{bmatrix}$

Tangents X₂ and X₃ for t₂₃:

$\quad\begin{bmatrix}{{{m_{2}x} + b_{2} - y} = 0} \\{{{m_{3}x} + b_{3} - y} = 0}\end{bmatrix}$

Step 4: Calculate the bisector of the tangent intersection points. Thisis a line from the tangent's intersection, t, to the midpoint of the twopoints, m. The midpoint coordinate m₁₂ equals half the distance from X₁to X₂. The midpoint coordinate and bisection coordinate t₁₂ are used toget the bisection line equation. This is found by solving the followingequation to find the slope:

${slope} = \frac{m_{y} - t_{y}}{m_{x} - t_{x}}$

and using the slope in the line equation to find the y-intercept:

b=slope*x−y=slope*tx−ty

the bisection line is then: y=slope*x−b.

Step 5: Find the bisectors intersection to give the ellipse's center, O.The ellipse's center is located at the intersection of the bisectors.The intersection coordinates are found using the bisectors lineequations determined in step 4 in the following system of linearequations.

Ellipse center located at (x,y) derived from:

$\quad\begin{bmatrix}{{{m_{1}x} + b_{1} - y} = 0} \\{{{m_{2}x} + b_{2} - y} = 0}\end{bmatrix}$

After an ellipse's center (p,q) has been determined, the semi-major axislength and the semi-minor axis length may be determined from the ellipseequation:

A(x−p)²+2B(x−p)(y−q)+C(y−q)²=1 using the three points randomly selectedto create three linear equations with respect to A, B, and C. First, theellipse is translated to the origin to reduce the ellipse equation to:Ax²+2Bxy+Cy²=1. This is done by subtracting p from x and q from y forthe three points selected in the beginning X₁, X₂, and X₃.

Once the ellipse is translated to the origin, the following system oflinear equations is solved to find the coefficients A, B, and C:

$\quad\begin{bmatrix}{{{Ax}_{1}^{2} + {2{Bx}_{1}y_{1}} + {Cy}_{1}^{2}} = 1} \\{{{Ax}_{2}^{2} + {2{Bx}_{2}y_{2}} + {Cy}_{2}^{2}} = 1} \\{{{Ax}_{3}^{2} + {2{Bx}_{3}y_{3}} + {Cy}_{3}^{2}} = 1}\end{bmatrix}$

Next solve the following equations for the semi-major axis (a) andsemi-minor axis(b):

semima joraxis(a)=√{square root over (|A ⁻¹|)}

semiminor(b)=√{square root over (|C ⁻¹|)}

As an alternative to fitting an ellipse with a Hough transform or aRandomized Hough transform, a parameterless non-iterative ellipsefitting technique may be utilized, such as described by Petraucean et.al., “A Parameterless Line Segment and Elliptical Arc Detector withEnchanged Ellipse Fitting,”http://ubee.enseeiht.fr/vision/ELSD/eccv2012-ID576.pdf, the disclosureof which is hereby incorporated by reference herein in its entirety.Petraucean describes an Ellipse and Line Segment Detector (ELSD) havingthree steps: (1) first, feature candidates are identified using aheuristic; (2) then each candidate has to pass a validation phase. Owingto the multiple families of features addressed, (3) a model selectionstep is required to choose the best geometric interpretation.

In other embodiments, the shape metric is a solidity parameter. Soliditydescribes the extent to which a shape is convex or concave. The solidityof a completely convex shape is 1, the farther the solidity deviatesfrom 1, the greater the extent of concavity in the structure. Soliditycan be defined by the equation:

$\frac{\lbrack{Area}\rbrack}{\lbrack {{Convex}\mspace{14mu} {Area}} \rbrack}$

Labeling Module

After the lymphocytes are identified and a shape metric derived for eachlymphocyte, a labeling module 208 is used such that the identifiedlymphocytes may be annotated, labeled, or associated with data, and sothat the generated data may be stored in database 240 (step 305). Insome embodiments, the labeling module 208 may create a database 240which is a non-transitory memory that stores data as noted herein. Insome embodiments, the database 240 storages the images received asinput, the coordinates of any lymphocytes (e.g. a center seed point ofthe lymphocyte or the coordinates of the outline of the lymphocyte), andany associated data or labels (e.g. derived shape metrics, other metricssuch as the area of the lymphocyte or the data points used to calculateany shape metric, staining intensity values, expression scores, tumorcell and lymphocyte classifications, etc.).

In some embodiments, image analysis data describing individual pixelswithin any identified cell (e.g. a tumor cell or an identifiedlymphocyte). The skilled artisan will appreciate that the data of allpixels within a particular cell may be averaged to provide an averagevalue of the pixel data within the cell. For example, individual pixelsmay each have a certain intensity. The intensity of all of the pixelswith a particular identified tumor cell corresponding to a first markermay be averaged to provide an average pixel intensity for that markerwithin the tumor cell. Likewise, the intensity of all of the pixels witha particular identified lymphocyte cell corresponding to a second markermay be averaged to provide an average pixel intensity for that markerwithin the lymphocyte. That average pixel for the particular cell (oreven group of cells or regions of interest) may be stored in database240.

In some embodiments, the labeling module 208 may assign a predictivelabel to each identified lymphocyte. For example, based on the one ormore derived shape metrics for each identified lymphocyte, the labelingmodule may assign a number (e.g. 1 through 10) or a letter (e.g. Athrough J) indicating on a sliding scale the likelihood that anyparticular identified lymphocyte is dynamic or motile. For example,using the above example, a value of A may mean that it is predictivethat a particular identified lymphocyte is most likely to be motile;while a value of J may mean that it is predictive that a particularidentified lymphocyte is least likely to be motile; where lettersbetween A and J provide a step-wise indication of the likelihood that aparticular identified lymphocyte is motile or not. In addition toassigning a letter or number, other indication may be assigned, such as“+” or “−.”

To achieve the foregoing, the labeling module will make use of one ormore predetermined shape value thresholds for each type of shape metric.For example, a predetermined threshold of an eccentricity value may beset to 0.65 and those identified lymphocytes with a derived eccentricityvalue of greater than 0.65 will be assigned a first motility label,while those having a derived eccentricity value of less than 0.65 willbe assigned a second motility label. Of course, ranges of predeterminedthresholds may be established, e.g. 0.0 to 0.2; 0.21 to 0.4; 0.41 to0.6; 0.61 to 0.8; and 0.81 to 1.0, where derived shape metric values foreach identified lymphocyte will be compared to the ranges and a labelassigned depending on the range in which the derived value falls into.

As an example of the data that may be generated by the labeling module208 and stored within database 240 for a particular identifiedlymphocyte may include: x,y coordinates of the seed center of thelymphocyte; x,y coordinates of an entire outline of a lymphocyte; acalculated area of the lymphocyte; a first derived shape metric; asecond derived shape metric; a data point used in calculating either thefirst or second derived shape metrics; a classification of thelymphocyte as a cytotoxic T cell, a helper T cell, etc.; a predictivelabel of whether the identified lymphocyte is likely or unlikely to bedynamic or motile; the overall density of lymphocytes within apredefined area of the image, within an entire tissue area, or of thewhole slide.

Overlay Generation Module

The skilled artisan will appreciate that the stored analysis results andassociated biological features can be later retrieved, and the data maybe reported or visualized in various formats. More specifically, thecoordinate data of each lymphocyte as well as the derived shape metricfor each lymphocyte may be retrieved from the database 240 (along withany other data) such that informative visual representations may be madeusing an overlay generation module 209. These visualizations areintended to assist a pathologist or histologist in the analysis of abiological sample. In some embodiments, the generated overlay may begenerated for a whole slide image, a particular tissue region or area(such as a tissue region which is believed to be rich in lymphocytes,TILs, or tumor tissue), or based on an area annotated by a pathologistor histologist for further review (e.g. such as after reviewing thestained slides under a microscope, or one of more serial sections, suchas sections stained for the presence of one or more biomarkers and/or asection stained with a primary stain and a counterstain).

Such visualizations are shown in FIGS. 7B, 8B, and 9B, where lymphocytesare identified from other cells or tissue, and the color (or theintensity of the color) provides feedback as to whether an identifiedlymphocyte is elongate or more round. A pathologist or histologist maybe able to decipher the visualization and provide an analysis in acomparatively quicker manner than if that same pathologist orhistologist had to go through the tedious task of manually identifyinglymphocytes and manually ascertaining each's shape. Not only is the taskof identifying lymphocytes and their shape ascertained more quickly, itis believed that the systems and methods herein facilitate a moreaccurate method of identifying lymphocytes and their shape.

In some embodiments, identified lymphocytes are traced. For example, analgorithm may be employed which traces the exterior boundary of anoutline of a lymphocyte, such as based on the generated foregroundsegmentation mask (from step 303). In some embodiments, the outlines maybe traced using a matlab function called bwboundaries(https://www.mathworks.com/help/images/ref/bwboundaries.html). Theboundary outlines may be each represented using a separate color orother indicia, where each separate color or other indicia represents arange of values for a recorded derived shape metric. By way of anon-limiting example, lymphocytes having an eccentricity value ofbetween 0.8 and 1.0 may be traced in purple; those lymphocytes having aneccentricity value of between 0.7 and 0.79 may be traced in dark blue;those lymphocytes having an eccentricity value of between 0.6 and 0.69may be traced in light blue; those lymphocytes having an eccentricityvalue of between 0.5 and 0.59 may be traced in green; those lymphocyteshaving an eccentricity value of between 0.4 and 0.49 may be traced inyellow; those lymphocytes having an eccentricity value of between 0.3and 0.39 may be traced in orange; and those lymphocytes having aneccentricity value of between 0.0 and 0.29 may be traced in red. Ofcourse, and in addition to tracing the outlines of any identifiedlymphocyte, the boundary created may be filled with a color or otherindicia.

In some embodiments, each identified lymphocyte is visualized with aseed point, such as one centered within each identified lymphocyte. Seedpoints are derived by calculating a centroid or center of mass of eachidentified lymphocyte (such as based on a derived area of thelymphocyte). Methods of determining centroids of irregular objects areknown to those of ordinary skill in the art. Once calculated, thecentroid of the lymphocyte is labeled (in addition, the x,y coordinatesof the seed point may be stored in a memory or database 240). In someembodiments, the position of the centroid or center of mass may besuperimposed on the input image, which may again be a whole slide imageor any portion thereof.

Scoring Module

In some embodiments, derived stain intensity values, counts of specificnuclei, or other classification results may be used to determine variousmarker expression scores, such as percent positivity, an Allred score,or an H-Score, using scoring module 210. Methods for scoring aredescribed in further detail in commonly-assigned and co-pendingapplications WO/2014/102130A1 “Image analysis for breast cancerprognosis” filed Dec. 19, 2013, and WO/2014/140085A1 “Tissueobject-based machine learning system for automated scoring of digitalwhole slides,” filed Mar. 12, 2104, the contents of each are herebyincorporated by reference in their entirety herein. For example, basedat least in part on the number of biomarker-positive tumorcells/biomarker-positive non-tumor cells, a score (e.g., a whole-slidescore, or a score for an annotated area of an image, such as an areaannotated by a pathologist or histologist) can be determined. In someembodiments, for each detected nuclear blob, average blob intensity,color and geometric features, such as area and shape of the detectednuclear blob may be computed, and the nuclear blobs are classified intotumor nuclei and nuclei of non-tumor cells. The number of identifiednuclei output corresponds to the total number of biomarker-positivetumor cells detected in a region, as evidenced by the number of tumornuclei counted. Other methods of scoring a sample are described in PCTPublication No. WO/2017/093524, and US Patent Publication Nos.2017/0103521 and 2017/0270666, the disclosures of which are herebyincorporated by reference herein in their entireties.

In embodiments where the samples are stained for the presence of alymphocyte biomarker and also for the presence of PD-L1, PD-L1expression may be scored by: (a) identifying tumor cells and lymphocytesin the tumor sample; (b) determining the number of tumor cells andlymphocytes expressing PD-L1 and/or the relative intensity of PD-L1expression in said cells; and (c) categorizing the tumor according tothe PD-L1 expression determined in (b). In some embodiments, theexpression of PD-L1 is determined by specifically detecting PD-L1protein and/or PD-L1 mRNA in the tumor. In some embodiments, the cellsare considered to express PD-L1 when the cell has at least partialmembrane staining of PD-L1 protein detected by IHC. In some embodiments,the tumor is categorized according to one or both of a modified H-score(MHS) or a modified proportion score (MPS), both computed from step (b)(see US Publication No. 2017/0372117 for additional information, thedisclosure of which is hereby incorporated by reference herein in itsentirety).

The H-score is, for example, a method of assessing the extent of nuclearimmunoreactivity. In dependence on the biomarker, different approachesfor H-score calculation may be used. To give an illustrative example,the H-score for steroid receptor nuclei can be obtained by the formula:3× percentage of strongly staining nuclei+2× percentage of moderatelystaining nuclei+percentage of weakly staining nuclei, giving a range of0 to 300.

In some embodiments, assigning the MHS comprises (i) estimating, acrossall of the viable tumor cells and stained mononuclear inflammatory cellsin all of the examined tumor nests, four separate percentages for cellsthat have no staining, weak staining (+1), moderate staining (+2) andstrong staining (+3), wherein a cell must have at least partial membranestaining to be included in the weak, moderate or strong stainingpercentages, and wherein the sum of all four percentages equals 100; and(ii) inputting the estimated percentages into the formula of 1×(percentof weak staining cells)+2×(percent of moderate stainingcells)+3×(percent of strong staining cells), and assigning the result ofthe formula to the tissue section as the MHS; wherein assigning the MPScomprises estimating, across all of the viable tumor cells andmononuclear inflammatory cells in all of the examined tumor nests, thepercentage of cells that have at least partial membrane staining of anyintensity, and assigning the resulting percentage to the tissue sectionas the MPS; and wherein if both the MHS and MPS are assigned, theassignments may be made in either order or simultaneously. For example,the four categories “no”, “weak”, “moderate” and “strong” may bedefined, for example, as non-overlapping intensity threshold ranges; forexample, a cell pixel region may be considered as a cell with “nostaining” if the average intensity value is less than 5%, as a cell with“weak staining” if the average intensity value is >5% and <25%, as acell with “moderate staining” if the average intensity value is >=25%and <75%, and as a cell with “strong staining” if the average intensityvalue is >=75%.

In some embodiments, the expression score is an Allred score. The Allredscore is a scoring system which looks at the percentage of cells thattest positive for hormone receptors, along with how well the receptorsshow up after staining (this is called “intensity”). This information isthen combined to score the sample on a scale from 0 to 8. The higher thescore, the more receptors are found and the easier they are to see inthe sample.

In other embodiments, the expression score is percent positivity. Again,in the context of scoring a breast cancer sample stained for the PR andKi-67 biomarkers, for the PR and Ki-67 slides, the percent positivity iscalculated (e.g., the total number of nuclei of cells (e.g., malignantcells) that are stained positive in each field of view in the digitalimage of a slide are summed and divided by the total number ofpositively and negatively stained nuclei from each of the fields of viewof a digital image) in a single slide as follows: Percentpositivity=number of positively stained cells/(number of positivelystained cells+number of negatively stained cells).

Other Components for Practicing Embodiments of the Present Disclosure

The system 200 of the present disclosure may be tied to a specimenprocessing apparatus that can perform one or more preparation processeson the tissue specimen. The preparation process can include, withoutlimitation, deparaffinizing a specimen, conditioning a specimen (e.g.,cell conditioning), staining a specimen, performing antigen retrieval,performing immunohistochemistry staining (including labeling) or otherreactions, and/or performing in situ hybridization (e.g., SISH, FISH,etc.) staining (including labeling) or other reactions, as well as otherprocesses for preparing specimens for microscopy, microanalyses, massspectrometric methods, or other analytical methods.

The processing apparatus can apply fixatives to the specimen. Fixativescan include cross-linking agents (such as aldehydes, e.g., formaldehyde,paraformaldehyde, and glutaraldehyde, as well as non-aldehydecross-linking agents), oxidizing agents (e.g., metallic ions andcomplexes, such as osmium tetroxide and chromic acid),protein-denaturing agents (e.g., acetic acid, methanol, and ethanol),fixatives of unknown mechanism (e.g., mercuric chloride, acetone, andpicric acid), combination reagents (e.g., Carnoy's fixative, methacarn,Bouin's fluid, B5 fixative, Rossman's fluid, and Gendre's fluid),microwaves, and miscellaneous fixatives (e.g., excluded volume fixationand vapor fixation).

If the specimen is a sample embedded in paraffin, the sample can bedeparaffinized using appropriate deparaffinizing fluid(s). After theparaffin is removed, any number of substances can be successivelyapplied to the specimen. The substances can be for pretreatment (e.g.,to reverse protein-crosslinking, expose nucleic acids, etc.),denaturation, hybridization, washing (e.g., stringency wash), detection(e.g., link a visual or marker molecule to a probe), amplifying (e.g.,amplifying proteins, genes, etc.), counterstaining, coverslipping, orthe like.

The specimen processing apparatus can apply a wide range of substancesto the specimen. The substances include, without limitation, stains,probes, reagents, rinses, and/or conditioners. The substances can befluids (e.g., gases, liquids, or gas/liquid mixtures), or the like. Thefluids can be solvents (e.g., polar solvents, non-polar solvents, etc.),solutions (e.g., aqueous solutions or other types of solutions), or thelike. Reagents can include, without limitation, stains, wetting agents,antibodies (e.g., monoclonal antibodies, polyclonal antibodies, etc.),antigen recovering fluids (e.g., aqueous- or non-aqueous-based antigenretrieval solutions, antigen recovering buffers, etc.), or the like.Probes can be an isolated nucleic acid or an isolated syntheticoligonucleotide, attached to a detectable label or reporter molecule.Labels can include radioactive isotopes, enzyme substrates, co-factors,ligands, chemiluminescent or fluorescent agents, haptens, and enzymes.

The specimen processing apparatus can be an automated apparatus, such asthe BENCHMARK XT instrument and SYMPHONY instrument sold by VentanaMedical Systems, Inc. Ventana Medical Systems, Inc. is the assignee of anumber of United States patents disclosing systems and methods forperforming automated analyses, including U.S. Pat. Nos. 5,650,327,5,654,200, 6,296,809, 6,352,861, 6,827,901 and 6,943,029, and U.S.Published Patent Application Nos. 20030211630 and 20040052685, each ofwhich is incorporated herein by reference in its entirety.Alternatively, specimens can be manually processed.

After the specimens are processed, a user can transport specimen-bearingslides to the imaging apparatus. In some embodiments, the imagingapparatus is a brightfield imager slide scanner. One brightfield imageris the iScan HT and DP200 (Griffin) brightfield scanner sold by VentanaMedical Systems, Inc. In automated embodiments, the imaging apparatus isa digital pathology device as disclosed in International PatentApplication No.: PCT/US2010/002772 (Patent Publication No.:WO/2011/049608) entitled IMAGING SYSTEM AND TECHNIQUES or disclosed inU.S. Patent Application No. 61/533,114, filed on Sep. 9, 2011, entitledIMAGING SYSTEMS, CASSETTES, AND METHODS OF USING THE SAME. InternationalPatent Application No. PCT/US2010/002772 and U.S. Patent Application No.61/533,114 are incorporated by reference in their entities.

The imaging system or apparatus may be a multispectral imaging (MSI)system or a fluorescent microscopy system. The imaging system used hereis an MSI. MSI, generally, equips the analysis of pathology specimenswith computerized microscope-based imaging systems by providing accessto spectral distribution of an image at a pixel level. While thereexists a variety of multispectral imaging systems, an operational aspectthat is common to all of these systems is a capability to form amultispectral image. A multispectral image is one that captures imagedata at specific wavelengths or at specific spectral bandwidths acrossthe electromagnetic spectrum. These wavelengths may be singled out byoptical filters or by the use of other instruments capable of selectinga pre-determined spectral component including electromagnetic radiationat wavelengths beyond the range of visible light range, such as, forexample, infrared (IR).

An MSI system may include an optical imaging system, a portion of whichcontains a spectrally-selective system that is tunable to define apre-determined number N of discrete optical bands. The optical systemmay be adapted to image a tissue sample, illuminated in transmissionwith a broadband light source onto an optical detector. The opticalimaging system, which in one embodiment may include a magnifying systemsuch as, for example, a microscope, has a single optical axis generallyspatially aligned with a single optical output of the optical system.The system forms a sequence of images of the tissue as the spectrallyselective system is being adjusted or tuned (for example with a computerprocessor) such as to assure that images are acquired in differentdiscrete spectral bands. The apparatus may additionally contain adisplay in which appears at least one visually perceivable image of thetissue from the sequence of acquired images. The spectrally-selectivesystem may include an optically-dispersive element such as a diffractivegrating, a collection of optical filters such as thin-film interferencefilters or any other system adapted to select, in response to either auser input or a command of the pre-programmed processor, a particularpass-band from the spectrum of light transmitted from the light sourcethrough the sample towards the detector.

An alternative implementation, a spectrally selective system definesseveral optical outputs corresponding to N discrete spectral bands. Thistype of system intakes the transmitted light output from the opticalsystem and spatially redirects at least a portion of this light outputalong N spatially different optical paths in such a way as to image thesample in an identified spectral band onto a detector system along anoptical path corresponding to this identified spectral band.

Embodiments of the subject matter and the operations described in thisspecification can be implemented in digital electronic circuitry, or incomputer software, firmware, or hardware, including the structuresdisclosed in this specification and their structural equivalents, or incombinations of one or more of them. Embodiments of the subject matterdescribed in this specification can be implemented as one or morecomputer programs, i.e., one or more modules of computer programinstructions, encoded on computer storage medium for execution by, or tocontrol the operation of, data processing apparatus. Any of the modulesdescribed herein may include logic that is executed by the processor(s).“Logic,” as used herein, refers to any information having the form ofinstruction signals and/or data that may be applied to affect theoperation of a processor. Software is an example of logic.

A computer storage medium can be, or can be included in, acomputer-readable storage device, a computer-readable storage substrate,a random or serial access memory array or device, or a combination ofone or more of them. Moreover, while a computer storage medium is not apropagated signal, a computer storage medium can be a source ordestination of computer program instructions encoded in an artificiallygenerated propagated signal. The computer storage medium can also be, orcan be included in, one or more separate physical components or media(e.g., multiple CDs, disks, or other storage devices). The operationsdescribed in this specification can be implemented as operationsperformed by a data processing apparatus on data stored on one or morecomputer-readable storage devices or received from other sources.

The term “programmed processor” encompasses all kinds of apparatus,devices, and machines for processing data, including by way of example aprogrammable microprocessor, a computer, a system on a chip, or multipleones, or combinations, of the foregoing. The apparatus can includespecial purpose logic circuitry, e.g., an FPGA (field programmable gatearray) or an ASIC (application-specific integrated circuit). Theapparatus also can include, in addition to hardware, code that createsan execution environment for the computer program in question, e.g.,code that constitutes processor firmware, a protocol stack, a databasemanagement system, an operating system, a cross-platform runtimeenvironment, a virtual machine, or a combination of one or more of them.The apparatus and execution environment can realize various differentcomputing model infrastructures, such as web services, distributedcomputing and grid computing infrastructures.

A computer program (also known as a program, software, softwareapplication, script, or code) can be written in any form of programminglanguage, including compiled or interpreted languages, declarative orprocedural languages, and it can be deployed in any form, including as astand-alone program or as a module, component, subroutine, object, orother unit suitable for use in a computing environment. A computerprogram may, but need not, correspond to a file in a file system. Aprogram can be stored in a portion of a file that holds other programsor data (e.g., one or more scripts stored in a markup languagedocument), in a single file dedicated to the program in question, or inmultiple coordinated files (e.g., files that store one or more modules,subprograms, or portions of code). A computer program can be deployed tobe executed on one computer or on multiple computers that are located atone site or distributed across multiple sites and interconnected by acommunication network.

The processes and logic flows described in this specification can beperformed by one or more programmable processors executing one or morecomputer programs to perform actions by operating on input data andgenerating output. The processes and logic flows can also be performedby, and apparatus can also be implemented as, special purpose logiccircuitry, e.g., an FPGA (field programmable gate array) or an ASIC(application-specific integrated circuit).

Processors suitable for the execution of a computer program include, byway of example, both general and special purpose microprocessors, andany one or more processors of any kind of digital computer. Generally, aprocessor will receive instructions and data from a read-only memory ora random-access memory or both. The essential elements of a computer area processor for performing actions in accordance with instructions andone or more memory devices for storing instructions and data. Generally,a computer will also include, or be operatively coupled to receive datafrom or transfer data to, or both, one or more mass storage devices forstoring data, e.g., magnetic, magneto-optical disks, or optical disks.However, a computer need not have such devices. Moreover, a computer canbe embedded in another device, e.g., a mobile telephone, a personaldigital assistant (PDA), a mobile audio or video player, a game console,a Global Positioning System (GPS) receiver, or a portable storage device(e.g., a universal serial bus (USB) flash drive), to name just a few.Devices suitable for storing computer program instructions and datainclude all forms of non-volatile memory, media and memory devices,including by way of example semiconductor memory devices, e.g., EPROM,EEPROM, and flash memory devices; magnetic disks, e.g., internal harddisks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROMdisks. The processor and the memory can be supplemented by, orincorporated in, special purpose logic circuitry.

To provide for interaction with a user, embodiments of the subjectmatter described in this specification can be implemented on a computerhaving a display device, e.g., an LCD (liquid crystal display), LED(light emitting diode) display, or OLED (organic light emitting diode)display, for displaying information to the user and a keyboard and apointing device, e.g., a mouse or a trackball, by which the user canprovide input to the computer. In some implementations, a touch screencan be used to display information and receive input from a user. Otherkinds of devices can be used to provide for interaction with a user aswell; for example, feedback provided to the user can be in any form ofsensory feedback, e.g., visual feedback, auditory feedback, or tactilefeedback; and input from the user can be received in any form, includingacoustic, speech, or tactile input. In addition, a computer can interactwith a user by sending documents to and receiving documents from adevice that is used by the user; for example, by sending web pages to aweb browser on a user's client device in response to requests receivedfrom the web browser.

Embodiments of the subject matter described in this specification can beimplemented in a computing system that includes a back-end component,e.g., as a data server, or that includes a middleware component, e.g.,an application server, or that includes a front-end component, e.g., aclient computer having a graphical user interface or a Web browserthrough which a user can interact with an implementation of the subjectmatter described in this specification, or any combination of one ormore such back-end, middleware, or front-end components. The componentsof the system can be interconnected by any form or medium of digitaldata communication, e.g., a communication network. Examples ofcommunication networks include a local area network (“LAN”) and a widearea network (“WAN”), an inter-network (e.g., the Internet), andpeer-to-peer networks (e.g., ad hoc peer-to-peer networks). For example,the network 20 of FIG. 1 can include one or more local area networks.

The computing system can include any number of clients and servers. Aclient and server are generally remote from each other and typicallyinteract through a communication network. The relationship of client andserver arises by virtue of computer programs running on the respectivecomputers and having a client-server relationship to each other. In someembodiments, a server transmits data (e.g., an HTML page) to a clientdevice (e.g., for purposes of displaying data to and receiving userinput from a user interacting with the client device). Data generated atthe client device (e.g., a result of the user interaction) can bereceived from the client device at the server.

All the U.S. patents, U.S. patent application publications, U.S. patentapplications, foreign patents, foreign patent applications andnon-patent publications referred to in this specification and/or listedin the Application Data Sheet are incorporated herein by reference, intheir entirety. Aspects of the embodiments can be modified, if necessaryto employ concepts of the various patents, applications and publicationsto provide yet further embodiments.

Although the present disclosure has been described with reference toseveral illustrative embodiments, it should be understood that manyother modifications and embodiments can be devised by those skilled inthe art that will fall within the spirit and scope of the principles ofthis disclosure. More particularly, reasonable variations andmodifications are possible in the component parts and/or arrangements ofthe subject combination arrangement within the scope of the foregoingdisclosure, the drawings, and the appended claims without departing fromthe spirit of the disclosure. In addition to variations andmodifications in the component parts and/or arrangements, alternativeuses will also be apparent to those skilled in the art.

1. A system for processing image analysis data derived from an image ofa biological sample stained for a presence of at least one lymphocytebiomarker, the system comprising: (i) one or more processors, and (ii)at least one memory coupled to the one or more processors, the at leastone memory to store computer-executable instructions that, when executedby the one or more processors, cause the system to perform operationscomprising: detecting lymphocytes in the image of the stained biologicalsample; identifying outlines of the detected lymphocytes by segmentingthe detected lymphocytes from other cells within the image; deriving ashape metric based on the identified outlines of each of the detectedlymphocytes; associating the derived shape metrics with locationinformation for each of the detected lymphocytes; comparing a value ofeach of the derived shape metrics to a predetermined threshold value forthe derived shape metric; and assigning a predictive cell motility labelto each of the detected lymphocytes based on the comparison.
 2. Thesystem of claim 1, wherein the shape metric is selected from the groupconsisting of a minor axis/major axis aspect ratio, an eccentricityparameter, a circularity parameter, a roundness parameter, and asolidity parameter.
 3. The system of claim 2, wherein the minoraxis/major axis aspect ratio is derived by: (i) fitting an ellipse tothe outline of each of the segmented lymphocytes; (ii) calculating alength of the fitted ellipse's minor axis and major axis; and (iii)calculating an aspect ratio between the calculated lengths of the minorand major axes.
 4. The system of claim 1, further comprising classifyingeach of the detected lymphocytes within a predefined area of the image.5. The system of claim 4, wherein the detected lymphocytes areclassified as cytotoxic T-lymphocytes, regulatory T-cells, or T-helpercells.
 6. The system of claim 1, wherein the value of the derived shapemetric is compared to a series of ranges of predetermined thresholdvalues and wherein each detected lymphocyte is assigned one of aplurality of cell motility labels based on the comparison.
 7. Anon-transitory computer-readable medium storing instructions forestimating shapes of lymphocytes in a biological sample stained for atleast a presence of the lymphocytes comprising: detecting lymphocytes inan image of the stained biological sample; identifying outlines of thedetected lymphocytes by segmenting the detected lymphocytes from othercells within the image; deriving a shape metric based on the identifiedoutlines of each of the detected lymphocytes; comparing a value of eachof the derived shape metrics to a predetermined threshold value for thederived shape metric; and assigning a predictive cell motility label toeach of the detected lymphocytes based on the comparison.
 8. Thenon-transitory computer-readable medium of claim 7, wherein theinstructions further comprise associating the derived shape metrics foreach of the detected lymphocytes with an x,y coordinate position of thedetected lymphocyte from the image.
 9. The non-transitorycomputer-readable medium of claim 7, wherein the shape metric isselected from the group consisting of a minor axis/major axis aspectratio, an eccentricity parameter, a circularity parameter, a roundnessparameter, and a solidity parameter.
 10. The non-transitorycomputer-readable medium of claim 9, wherein the minor axis/major axisaspect ratio is derived by: (i) fitting an ellipse to the outline ofeach of the segmented lymphocytes; (ii) calculating a length of thefitted ellipse's minor axis and major axis; and (iii) calculating anaspect ratio between the calculated lengths of the minor and major axes.11. The non-transitory computer-readable medium of claim 10, wherein theellipse is fitted to the outline of each of the segmented lymphocytes byperforming a Hough transform or a Randomized Hough Transform.
 12. Thenon-transitory computer-readable medium of claim 7, wherein the value ofthe derived shape metric is compared to a series of ranges ofpredetermined threshold values and wherein each detected lymphocyte isassigned one of a plurality of cell motility labels based on thecomparison.
 13. The non-transitory computer-readable medium of claim 7,wherein the instructions further comprise generating a representationalobject for each detected lymphocyte and overlaying the representationalobjects onto the detected lymphocytes in the image.
 14. A method ofprocessing image analysis data derived from an image of a biologicalspecimen stained for a presence of at least one lymphocyte biomarker,the method comprising: detecting lymphocytes in the image; computing aforeground segmentation mask based on the lymphocytes detected withinthe image; identifying outlines of the detected lymphocytes in the imageby filtering the image with the computed foreground segmentation mask;deriving a shape metric for each of the detected lymphocytes based onthe identified lymphocyte outlines; associating the derived shapemetrics with location information for each of the detected lymphocytes;comparing a value of each of the derived shape metrics to apredetermined threshold value for the derived shape metric; andassigning a predictive cell motility label to each of the detectedlymphocytes based on the comparison.
 15. The method of claim 14, whereinthe shape metric is selected from the group consisting of a minoraxis/major axis aspect ratio, an eccentricity parameter, a circularityparameter, a roundness parameter, and a solidity parameter.
 16. Themethod of claim 15, wherein the minor axis/major axis aspect ratio isderived by: (i) fitting an ellipse to the outline of each of thedetected lymphocytes; (ii) calculating a length of the fitted ellipse'sminor axis and major axis; and (iii) calculating an aspect ratio betweenthe calculated lengths of the minor and major axes.
 17. The method ofclaim 14, further comprising classifying each of the detectedlymphocytes within a predefined area of the image.
 18. The method ofclaim 17, wherein the detected lymphocytes are classified as cytotoxicT-lymphocytes, regulatory T-cells, or T-helper cells.
 19. The method ofclaim 14, wherein the value of the derived shape metric is compared to aseries of ranges of predetermined threshold values and wherein eachdetected lymphocyte is assigned one of a plurality of cell motilitylabels based on the comparison.
 20. The method of claim 14, furthercomprising generating a representational object for each detectedlymphocyte and overlaying the representational objects onto the detectedlymphocytes in the image.